0% found this document useful (0 votes)
21 views28 pages

Deep Learning Approach For Intelligent Intrusion D

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views28 pages

Deep Learning Approach For Intelligent Intrusion D

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/332203589

Deep Learning Approach for Intelligent Intrusion Detection System

Article in IEEE Access · April 2019


DOI: 10.1109/ACCESS.2019.2895334

CITATIONS READS

1,362 6,544

6 authors, including:

Vinayakumar Ravi Mamoun Alazab


Prince Mohammad University Charles Darwin University
385 PUBLICATIONS 10,289 CITATIONS 348 PUBLICATIONS 19,023 CITATIONS

SEE PROFILE SEE PROFILE

Soman Kp Prabaharan Poornachandran


Amrita Vishwa Vidyapeetham Amrita Vishwa Vidyapeetham
820 PUBLICATIONS 15,581 CITATIONS 112 PUBLICATIONS 4,943 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Vinayakumar Ravi on 27 March 2020.

The user has requested enhancement of the downloaded file.


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI

Deep Learning Approach for Intelligent


Intrusion Detection System
VINAYAKUMAR R1 , MAMOUN ALAZAB2 , (Senior Member, IEEE), SOMAN KP1 ,
PRABAHARAN POORNACHANDRAN3 , AMEER AL-NEMRAT4 , and SITALAKSHMI
VENKATRAMAN5
1
Center for Computational Engineering and Networking (CEN), Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, India
2
Charles Darwin University, Australia
3
Centre for Cyber Security Systems and Networks, Amrita School of Engineering, Amritapuri, Amrita Vishwa Vidyapeetham, India
4
School of Architecture Computing, and Engineering (ACE), University of East London
5
Melbourne Polytechnic, Australia
Corresponding author: Vinayakumar R (e-mail: [email protected]).
This research was supported in part by Paramount Computer Systems and Lakhshya Cyber Security Labs. We are grateful to NVIDIA
India, for the GPU hardware support to research grant. We are also grateful to Computational Engineering and Networking (CEN)
department for encouraging the research.

ABSTRACT Machine learning techniques are being widely used to develop an intrusion detection system
(IDS) for detecting and classifying cyber-attacks at the network-level and host-level in a timely and
automatic manner. However, many challenges arise since malicious attacks are continually changing and are
occurring in very large volumes requiring a scalable solution. There are different malware datasets available
publicly for further research by cyber security community. However, no existing study has shown the
detailed analysis of the performance of various machine learning algorithms on various publicly available
datasets. Due to the dynamic nature of malware with continuously changing attacking methods, the malware
datasets available publicly are to be updated systematically and benchmarked. In this paper, deep neural
network (DNN), a type of deep learning model is explored to develop a flexible and effective IDS to detect
and classify unforeseen and unpredictable cyber-attacks. The continuous change in network behaviour and
rapid evolution of attacks makes it necessary to evaluate various datasets which are generated over the
years through static and dynamic approaches. This type of study facilitates to identify the best algorithm
which can effectively work in detecting future cyber-attacks. A comprehensive evaluation of experiments of
DNNs and other classical machine learning classifiers are shown on various publicly available benchmark
malware datasets. The optimal network parameters and network topologies for DNNs is chosen through
following hyper parameter selection methods with KDDCup 99 dataset. All experiments of DNNs are run
till 1,000 epochs with learning rate varying in the range [0.01-0.5]. The DNN model which performed
well on KDDCup 99 is applied on other datasets such as NSL-KDD, UNSW-NB15, Kyoto, WSN-DS and
CICIDS 2017 to conduct the benchmark. Our DNN model learns the abstract and high dimensional feature
representation of the IDS data by passing them into many hidden layers. Through a rigorous experimental
testing it is confirmed that DNNs perform well in comparison to the classical machine learning classifiers.
Finally, we propose a highly scalable and hybrid DNNs framework called Scale-Hybrid-IDS-AlertNet
(SHIA) which can be used in real time to effectively monitor the network traffic and host-level events
to proactively alert possible cyber-attacks.

INDEX TERMS Cyber Security, Intrusion Detection, Malware, Big data, Machine Learning, Deep
Learning, Deep Neural Networks, cyberattacks, Cybercrime

I. INTRODUCTION [1]. These attacks can be manual and machine generated,


diverse and are gradually advancing in obfuscations resulting
Information and communications technology (ICT) systems in undetected data breaches. For instance, the Yahoo data
and networks handle various sensitive user data that are prone breach had caused a loss of $350M and Bitcoin breach
by various attacks from both internal and external intruders

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

resulted in a rough estimate of $70M loss [2]. Such cyber- required to persevere today’s rapidly increasing high-speed
attacks are constantly evolving with very sophisticated al- network size, speed and dynamics. These challenges form
gorithms with the advancement of hardware, software, and the prime motivation for this work with a research focus on
network topologies including the recent developments in the evaluating the efficacy of various classical machine learning
Internet of Things (IoT) [4]. Malicious cyber-attacks pose classifiers and deep neural networks (DNNs) applied to NIDS
serious security issues that demand the need for a novel, and HIDS. This work assumes the following;
flexible and more reliable intrusion detection system (IDS). • An attacker aims at pretence as normal user to remain
An IDS is a proactive intrusion detection tool used to detect hidden from the IDS. However, the patterns of intrusive
and classify intrusions, attacks, or violations of the security behaviors differ in some aspect. This is due to the
policies automatically at network-level and host-level infras- specific objective of an attacker for example getting an
tructure in a timely manner. Based on intrusive behaviors, unauthorised access to computer and network resources.
intrusion detection is classified into network-based intrusion • The usage pattern of network resources can be captured,
detection system (NIDS) and host-based intrusion detection however the existing methods ends up in high false
system (HIDS) [5]. An IDS system which uses network positive rate.
behaviour is called as NIDS. The network behaviours are • The patterns of intrusions exist in normal traffic with a
collected using network equipment via mirroring by network- very low profile over long time interval.
ing devices, such as switches, routers, and network taps and
Overall, this work has made the following contributions to
analysed in order to identify attacks and possible threats
the cyber security domain:
concealed within in network traffic. An IDS system which
uses system activities in the form of various log files running • By combining both NIDS and HIDS collaboratively,

on the local host computer in order to detect attacks is called an effective deep learning approach is proposed by
as HIDS. The log files are collected via local sensors. While modelling a deep neural network (DNN) to detect cyber-
NIDS inspects each packet contents in network traffic flows, attacks proactively. In this study, the efficacy of various
HIDS relies on the information of log files which includes classical machine learning algorithms and DNNs are
sensors logs, system logs, software logs, file systems, disk re- evaluated on various NIDS and HIDS datasets in identi-
sources, users account information and others of each system. fying whether network traffic behaviour is either normal
Many organizations use a hybrid of both NIDS and HIDS. or abnormal due to an attack that can be classified into
Analysis of network traffic flows is done using misuse corresponding attack categories.
detection, anomaly detection and stateful protocol analysis. • The advanced text representation methods of natural

Misuse detection uses predefined signatures and filters to language processing (NLP) are explored with host-level
detect the attacks. It relies on human inputs to constantly events, i.e. system calls with the aim to capture the
update the signature database. This method is accurate in contextual and semantic similarity and to preserve the
finding the known attacks but is completely ineffective in the sequence information of system calls. The comparative
case of unknown attacks. Anomaly detection uses heuristic performance of these methods is conducted with the
mechanisms to find the unknown malicious activities. In most ADFA-LD and ADFA-WD datasets.
of the scenarios, anomaly detection produces a high false • This study uses various benchmark datasets to conduct

positive rate [5]. To combat this problem, most organizations a comparative experimentation. This is mainly due to
use the combination of both the misuse and anomaly detec- the reason that each dataset suffers from various issues
tion in their commercial solution systems. Stateful protocol such as data corruptions, traffic variety, inconsistencies,
analysis is most powerful in comparison to the aforemen- out of date and contemporary attacks.
tioned detection methods due to the fact that stateful protocol • A scalable hybrid intrusion detection framework called

analysis acts on the network layer, application layer and SHIA is introduced to process large amount of network-
transport layer. This uses the predefined vendors specification level and host-level events to automatically identify
settings to detect the deviations of appropriate protocols and malicious characteristics in order to provide appropriate
applications. Though deep learning approaches are being alerts to the network admin. The proposed framework is
considered more recently to enhance the intelligence of such highly scalable on commodity hardware server and by
intrusion detection techniques, there is a lack of study to joining additional computing resources to the existing
benchmark such machine learning algorithms with publicly framework, the performance can be further enhanced to
available datasets. The most common issues in the existing handle big data in real-time systems.
solutions based on machine learning models are: firstly, the The code and detailed results are made publicly available
models produce high false positive rate [3], [5] with wider [7] for further research. The remainder of the chapter is
range of attacks; secondly, the models are not generalizable organized as follows. Section II discusses various stages of
as existing studies have mainly used only a single dataset compromise according to attackers perspective. Section III
to report the performance of the machine learning model; discusses the related works of similar research work done
thirdly, the models studied so far have completely unseen to NIDS and HIDS. Information of scalable framework,
today’s huge network traffic; and finally the solutions are the mathematical details of DNNs and text representation
2

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

methods for intrusion detection is placed in Section IV. Sec- to eavesdropping. ’Probe’ attack can be launched over either
tion V includes information related to major shortcomings over a network or locally within a system. Now an attack can
of IDS datasets, problem formulation and statistical mea- be defined as a set of actions that potentially compromises
sures. Section VI includes description of datasets. Section the confidentiality, data integrity, availability, or any kind of
VII and Section VIII includes experimental analysis and a security policy of a resource. Primarily, an IDS system aims
brief overview of proposed system and architecture design. at detecting all these types of attacks to prevent the computers
Section IX presents the experimental results. Conclusion, and networks from malicious activities. In this work, we
future work directions and discussions are placed in Section focus towards the categorization scheme as suggested by the
X. DARPA Intrusion Detection Evaluation.

II. STAGES OF COMPROMISE: AN ATTACKER’S VIEW III. RELATED WORKS


Mostly, intrusions are initiated by unauthorized users named The research on security issues relating to NIDS and HIDS
as attackers. An attacker can attempt to access a computer exists since the birth of computer architectures. In recent
remotely via the Internet or to make a service remotely days, applying machine learning based solutions to NIDS and
unusable. Detection of intrusion accurately requires under- HIDS is of prime interest among security researchers and
standing the method to successfully attack a system. Gen- specialists. A detailed survey on existing machine learning
erally, an attack can be classified into five phases. They are based solutions is discussed in detail by [5]. This section
reconnaissance, exploitation, reinforcement, consolidation, discusses the panorama of largest study to date that explores
and pillage. An attack can be detected during the first three the field of machine learning and deep learning approaches
phases however once it reaches the fourth or fifth phase applied to enhance NIDS and HIDS.
then the system will be fully compromised. Thus, it is very
difficult to distinguish between a normal behavior and an A. NETWORK-BASED INTRUSION DETECTION
attack. During the Reconnaissance phase, an attacker tries to SYSTEMS (NIDS)
collect information related to reachable hosts and services, as Commercial NIDS primarily use either statistical measures
well as the versions of the operating systems and applications or computed thresholds on feature sets such as packet length,
that are running. During the exploitation phase, an attacker inter-arrival time, flow size and other network traffic parame-
utilizes a particular service with the aim to access the target ters to effectively model them within a specific time-window
computer. A service may be identified as abusing, subverting, [6]. They suffer from high rate of false positive and false
or breaching. An abusing service includes stolen password or negative alerts. A high rate of false negative alerts indicates
dictionary attacks and subversion includes an SQL injection. that the NIDS could fail to detect attacks more frequently,
After an illegal forced entry to a system, an attacker follows and a high rate of false positive alerts means the NIDS could
camouflage activity and then installs supplementary tools and unnecessarily alert when no attack is actually taking place.
services to take advantage of the privileges gained during the Hence, these commercial solutions are ineffective for present
reinforcement phase. Based on the misused user account, an day attacks.
attacker tries to gain full system access. Finally, an attacker Self-learning system is one of the effective methods to
utilizes the applications that are accessible from the available deal with the present day attacks. This uses supervised,
user account. An attacker obtains a complete control over the semi-supervised and unsupervised mechanisms of machine
system in the consolidation phase and the installed backdoor learning to learn the patterns of various normal and malicious
which is used for communication purposes during the consol- activities with a large corpus of Normal and Attack net-
idation phase. The final phase is pillage where an attacker’s work and host-level events. Though various machine learning
possible malicious activities include theft of data and CPU based solutions are found in the literature, the applicability
time, and impersonation. to commercial systems is in early stages [9]. The existing
Since computers and networks are assembled and pro- machine learning based solutions outputs high false positive
grammed by humans, there are possibilities for bugs in both rate with high computational cost [3]. This is because ma-
the hardware and software. These human errors and bugs chine learning classifiers learn the characteristic of simple
can lead to vulnerabilities [8]. Confidentiality, data integrity TCP/IP features locally. Deep learning is a complex subnet
and availability are main pillars of information security. Au- of machine learning that learns hierarchical feature repre-
thenticity and accountability are also plays an important role sentations and hidden sequential relationships by passing the
in information security. Generally attacks against the confi- TCP/IP information on several hidden layers. Deep learning
dentiality addresses passive attacks for example eavesdrop- has achieved significant results in long standing Artificial in-
ping, integrity addresses active attacks for example system telligence (AI) tasks in the field of image processing, speech
scanning attacks i.e. ’Probe’ and availability addresses the recognition, natural language processing (NLP) and many
attacks related to making network resources down so these others [10]. Additionally, these performances have been
will be unavailable for normal users for example denial of transformed to various cyber security tasks such as intrusion
service (’DoS’) and distributed denial of service (’DDoS’). detection, android malware classification, traffic analysis,
IDS systems have limited capability to detect attacks related network traffic prediction, ransomware detection, encrypted
3

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

text categorization, malicious URL detection, anomaly de- in terms of information gain. In addition, they presented the
tection, and malicious domain name detection [11]. This most relevant features for each class label. [21] discussed
work focuses towards analyzing the effectiveness of various random forest techniques in misuse detection by learning
classical machine learning and deep neural networks (DNNs) patterns of intrusions, anomaly detection with outlier detec-
for NIDS with the publicly available network-based intrusion tion mechanism, and hybrid detection by combining both the
datasets such as KDDCup 99, NSL-KDD, Kyoto, UNSW- misuse and anomaly detection. They reported that the misuse
NB15, WSN-DS and CICIDS 2017. approach worked better than winning entries of KDDCup 99
A large study of academic research used the de facto stan- challenge results, and in addition anomaly detection worked
dard benchmark data, KDDCup 99 to improve the efficacy better compared to other published unsupervised anomaly
of intrusion detection rate. KDDCup 99 was used for the detection methods. Overall, it was concluded that the hybrid
third International Knowledge Discovery and Data Mining system enhances the performance with the advantage of com-
Tools Competition and the data was created as the processed bining both the misuse and anomaly detection approaches
form of tcpdump data of the 1998 DARPA intrusion detection [22], [23], [72]. In [24], an ID algorithm using AdaBoost
(ID) evaluation network. The aim of the contest was to create technique was proposed that used decision stumps as weak
a predictive model to classify the network connections into classifiers. Their system performed better than other pub-
two classes: Normal or Attack. Attacks were categorized into lished results with a lower false-alarm rate, a higher detection
denial of service (’DoS’), ’Probe’, remote-to- local (’R2L’), rate, and a computationally faster algorithm. However, the
user-to-root (’U2R’) categories. The mining audit data for drawback is that it failed to adopt the incremental learning
automated models for ID (MADAMID) was used as feature approach. In [25], the performance of the shared nearest
construction framework in KDDCup 99 competition [17]. neighbour (SNN) based model in ID was studied and reported
MADAMID outputs 41 features: first 9 features are basic fea- as the best algorithm with a high detection rate. With the re-
tures of a packet, 10-22 are content features, 23-31 are traffic duced dataset they were able to conclude that SNN performed
features, and 32-41 are host-based features. The choices of well in comparison to the K-means for ’U2R’ attack category.
available datasets are: (1) full dataset and (2) complementary However, their work failed to show the results on the entire
10% data. The detailed evaluation results of KDDCup 98 testing dataset.
and KDDCup 99 challenge was published in [3]. Totally, 24 In [26], Bayesian networks for ID was explored using
entries were submitted in the KDDCup 98, in that 3 winning Naive Bayesian networks with a root node to represent a
entries used variants of decision tree to whom they showed class of a connection and leaf nodes to represent features
only the marginal statistics significance in performance. The of a connection. Later, [27] investigated the application of
9th winning entry in the contest used the 1-nearest neighbour Naive Bayes network to ID and through detailed experi-
classifier. The first significant difference in performance was mental analysis, they showed that Bayesian networks per-
found between 17th and 18th entries. This inferred that the formed equally well and sometimes even better in ’U2R’ and
first 17 submissions method were robust and were profiled ’Probe’ categories in comparison with the winning entries
by [3]. The Third International Knowledge Discovery and of KDDCup 99 challenge. In [28], a non-parametric den-
Data Mining Tools Competition task remained as a baseline sity estimation method based on Parzen-window estimators
work and after this contest many machine learning solutions was studied with Gaussian kernels and Normal distribution.
have been found. Most of the published results took only the Without the intrusion data, their system was comparatively
10% data of training and testing and few of them used custom favourable to the existing winning entries that was based on
built datasets. Recently, a comprehensive literature survey on ensemble of decision trees. In [29], a genetic algorithm based
machine learning based ID with KDDCup 99 dataset was NIDS was proposed that facilitates to model both temporal
conducted [18]. and spatial information to identify complex anomalous be-
After the challenge, most of the published results of KD- haviour. An overview of ensemble learning techniques for
DCup 99 have used several feature engineering methods for ID was given in [30], and swarm intelligence techniques for
dimensionality reduction [18]. While few studies employed ID using ant colony optimization, ant colony clustering and
custom-built datasets, majority used the same dataset for particle swarm optimization of systems were studied in [31].
newly available machine learning classifiers [18]. These pub- A comparative study in such research works show that the
lished results are partially comparable to the results of the descriptive statistics was predominantly used.
KDDCup 99 contest. Overall, a comprehensive literature review shows very few
In [19], the classification model consists of two-stages: i) studies use modern deep learning approaches for NIDS and
P-rules stage to predict the presence of the class, and ii) N- the commonly used benchmark datasets for experimental
rules stage to predict the absence of the class. This performed analysis are KDDCup 99 and NSL-KDD [32], [33], [34],
well in comparison with the aforementioned KDDCup 99 [3]. The IDS based on recurrent neural network (RNN)
results except for the user-to-root (’U2R’) category. In [20], outperformed other classical machine learning classifiers in
the significance of feature relevance analysis was investigated identifying intrusion and intrusion type on the NSL-KDD
for IDS with the most widely used dataset, KDDCup 99. For dataset [32]. Two level approach proposed for IDS in which
each feature they were able to express the feature relevance the first level extracts the optimal features using sparse au-
4

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

toencoder in an unsupervised way and classified using soft- tions and low level system applications. Since an application
max regression [33]. The application of stacked autoencoder makes communication with the operating system via system
was proposed for optimal feature extraction in an unsuper- calls, their behaviour, ordering, type and length generates
vised way where the proposed method is completely non- a unique trace. This can be used to distinguish between
symmetric and classification was done using Random forest. the known and unknown applications [12]. System calls of
Novel long short-term memory (LSTM) architecture was normal and intrusive process are entirely different. Thus anal-
proposed and by modeling the network traffic information ysis of those system calls provides significant information
in time series obtained better performance. The proposed about the processes of a system. Various feature engineering
method performed well compared to all the existing methods approaches have been used for system call based process
and as well as KDDCup 98 and 99 challenge entries [3]. The classification. They are N-gram [13], [12], sequence gram
performance of various RNN types were evaluated by [34]. [14] and pair gram [15]. An important advantage of HIDS
Various deep learning architectures and classical machine is that it provides detailed information about the attacks.
learning algorithms were evaluated for anomaly based ID The three main components of HIDS, namely the data
on NSL-KDD dataset [74]. The configuration of SVM was source, the sensor, and the decision engine play an important
formulated as bi-objective optimization problem and solved role in detecting security intrusions. The sensor component
using hyper-heuristic framework. The performance was eval- monitors the changes in data source, and the decision engine
uated for malware and anomaly ID. The proposed framework uses the machine learning module to implement to the intru-
is very suitable for big data cyber security problems [75]. To sion detection mechanism. However, the benchmarking the
enhance the anomaly based ID rate, the spatial and temporal data source component requires much investigation.
features were extracted using convolutional neural network Compiling the KDDCup 99 dataset involved the data
and long short-term memory architecture. The performance source component with system calls and Sequence Time-
was shown on both KDDCup 99 and ISCX 2012 datasets Delay Embedding (STIDE) approach used to analyze the
[76]. Two step attack detection method was proposed along fixed length pattern of system calls to distinguish between
with a secure communication protocol for big data systems normal and anomalous behaviours [13]. A large number
to identify insider attack. In the first step, process profiling of decision engines have been used to analyse patterns of
was done independently at each node and in second step system calls to detect intrusions. Such a data source is most
using hash matching and consensus, process matching was commonly used among cyber security research community.
done [77]. An online detection and estimation method was Apart from system calls, since Windows operating system
proposed for smart grid system [78]. The method specifically (OS) does not provide a direct access to system calls, log
designed for identifying false data injection and jamming entries [35] and registry entry manipulations [36] form the
attacks in real time and additionally provides online estimates other two most commonly used data sources. This work
of the unknown and time-varying attack parameters and focuses on the decision engine component to benchmark the
recovered state estimates [78]. A scalable framework for ID data source.
over vehicular ad hoc network was proposed. The framework Classical methods aim to find information about the nature
uses distributed machine learning i.e. alternating direction of the host activity by analyzing the patterns in the sequence
method of multipliers (ADMM) to train a machine learning of system calls. While STIDE was most commonly used
model in a distributed way to learn whether an activity simple algorithm, Support Vector Machines (SVMs), Hidden
normal or attack [79]. Markov Models (HMMs) and Artificial Neural Networks
(ANNs) are more recently adopted complex methods. In [37],
B. HOST-BASED INTRUSION DETECTION SYSTEMS N-gram feature extraction approach was used for compiling
(HIDS) the ADFA-LD system call data and N-gram features were
Various software tools such as Metasploit, Sqlmap, Nmap, passed to different classical machine learning classifiers to
Browser Exploitation provide the necessary framework to identify and categorize attacks. In [39], in order to reduce the
examine and gather information from target system vulner- dimensions of system calls, K-means and KNN were exper-
abilities. Malicious attackers use such information to launch imented using a frequency based model. A revised version
attacks to various applications like FTP server, web server, of N-gram model was used in [38] to represent system calls
SSH server, etc. Existing methods such as firewall, cryptogra- with various classical machine learning classifiers for both
phy methods and authentications aim to defend host systems Binary and Multi-class categories. An approach for HIDS
against such attacks. However, these solutions have limita- based on N-gram system call representations with various
tions and malicious attackers are able to gain unauthorized classical machine learning classifiers was proposed in [40].
access to the system. To address this, a typical HIDS operates To reduce the dimensions of N-gram, dimensionality reduc-
at host-level by analysing and monitoring all traffic activities tion methods were employed. In [41], frequency distribution
on the system application files, system calls and operating based feature engineering approach with machine learning
system [73]. These types of traffic activities are typically algorithms was explored to handle the zero-day and stealth
called as audit trials. A system call of an operating system attacks in Windows OS. In [42], an ensemble approach for
is a key feature that interacts between the core kernel func- HIDS was proposed using language modelling to reduce the
5

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

false alarm rates which is a drawback in classical methods. The proposed scalable architecture employs distributed
This method leveraged the semantic meaning and communi- and parallel machine learning algorithms with various opti-
cations of system call. The effectiveness of their methods was mization techniques that makes it capable of handling very
evaluated on three different publicly available datasets. high volume of network and host-level events. The scalable
Overall, the published results are limited in detecting the architecture also leverages the processing capability of the
intrusions and cyber-attacks using HIDS. Studies that show general purpose graphical processing unit (GPGPU) cores for
an increase in detection rate of intrusions and cyber-attacks faster and parallel analysis of network and host-level events.
also show an increase in false alarm rate. The framework contains two types of analytic engines, they
The pros and cons of NIDS and HIDS with its efficacy are are real-time and non-real-time. The purpose of analytic en-
discussed in detail by [16]. Major advantages of HIDSs are: gine is to monitor network and host-level events to generate
HIDSs facilitate to detect local attacks and are unaffected by an alert for an attack. The developed framework can be scaled
the encryption of network traffic. Major disadvantage is that out to analyze even larger volumes of network event data by
they need all the configuration files to identify attack, but it adding additional computing resources. The scalability and
is a daunting task due to the huge amount of data. Allowing real-time detection of malicious activities from early warning
access to big data technology in the domain of cyber security signals makes the developed framework stand out from any
is of paramount importance, particularly IDS. The motivation system of similar kind.
of this research is to develop a novel scalable platform with
hybrid framework of NIDS and HIDS, which is capable of B. TEXT REPRESENTATION METHODS
handling large amount of data with the aim to detect the
System calls are essential in any operating system depicting
intrusions and cyber-attacks more accurately.
the computer processes and they constitute a humongous
amount of unstructured and fragmented texts that a typical
IV. PROPOSED SCALABLE FRAMEWORK
HIDS uses to detect intrusions and cyber-attacks. In this
Today’s ICT system is considerably more complex, con-
research we consider text representation methods to classify
nected and involved in generating extremely large volume
the process behaviours using system call trace. Classical ma-
of data, typically called as big data. This is primarily due to
chine learning approaches adopt feature extraction, feature
the advancement in technologies and rapid deployments of
engineering and feature representation methods. However,
large number of applications. Big data is a buzzword which
with advanced machine learning embedded approach such
contains techniques to extract important information from
as deep learning, the necessity of the feature engineering
large volume of data. Allowing access to big data technology
and feature extraction steps can be completely avoided. We
in the domain cyber security particularly IDS is of paramount
adopt such advanced deep learning along with text repre-
importance [44]. The advancement in big data technology
sentation methods to capture the contextual and sequence
facilitates to extract various patterns of legitimate and ma-
related information from system calls. The following feature
licious activities from large volume of network and system
representation methods in the field of NLP are used to convert
activities data in a timely manner that in turn facilitates to
the system calls into feature vectors in this study.
improve the performance of IDS. However, processing of big
data by using the conventional technologies is often difficult • Bag-of-Words (BoW): This classical and most com-
[43]. The purpose of this section is to describe the comput- monly used representation method is used to form a dic-
ing architecture and the advanced methods adopted in the tionary by assigning a unique number for each system
proposed framework, such as text representation methods, call. Term document matrix (TDM) and term frequency-
deep neural networks (DNNs) and the training mechanisms inverse document frequency (TF-IDF) are employed to
employed in DNNs. estimate the feature vectors. The drawback is that it
cannot capture the sequence information of system calls
A. SCALABLE COMPUTING ARCHITECTURE [46].
The technologies such as Hadoop Map reduce and Apache • N-grams: An N-gram text representation method has
Spark in the field of high performance computing is found to the capability to preserve the sequence information of
be an effective solution to process the big data and to provide system calls. The size of N can be 1 (uni-gram), 2 (bi-
timely actions. We have developed scalable framework based gram), 3 (tri-gram), 4 (four-gram), etc., which can be
on big data techniques, Apache Spark cluster computing employed appropriately depending on the context.
platform [45]. Due to the confidential nature of the research, • Keras Embedding: This follows a sequential represen-
the scalable framework details cannot be disclosed. The tation method to convert the system calls into a numeric
Apache spark cluster computing framework is setup over form of vocabulary by simply assigning a unique num-
Apache Hadoop Yet Another Resource Negotiator (YARN). ber for each system call. The size of vocabulary defines
This framework facilitates to efficiently distribute, execute the number of unique system calls and their frequency of
and harvest tasks. Each system has specifications(32 GB occurrence places them in an ascending order within a
RAM, 2 TB hard disk, Intel(R) Xeon(R) CPU E3-1220 v3 lookup table. Each system call in a vector is transformed
@ 3.10GHz) running over 1 Gbps Ethernet network. to a numeric using the lookup table for assigning a
6

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

corresponding index. We adopt a fixed length vector


method by transforming all vectors to the same length.

C. DEEP NEURAL NETWORK (DNN)


We employ an artificial neural network (ANN) approach
as the computational model since it is influenced by the
characteristics of biological neural networks to incorporate
intelligence in our proposed method. Feed forward neural
network (FFN), a type of ANN is represented as a directed
graph to pass various system information along edges from
one node to another without forming a cycle. We adopt a
multilayer perceptron (MLP) model which is a type of FFN
having three or more layers with one input layer, one or
more hidden layers and an output layer in which each layer
has many neurons or units in mathematical notation. We
select the number of hidden layers by following a hyper FIGURE 1: Architecture of a deep neural network (DNN).
parameter selection method. The information is transformed
from one layer to another layer in a forward direction with
neurons in each layer being fully connected. MLP is defined o = o1 , o2 , · · · , oc−1 , oc . However, all the connections and
mathematically as O : Rm × Rn where m is the size of the hidden layers along with its units are not shown in Figure 1.
input vector x = x1 , x2 , · · · , xm−1 , xm and n is the size of We employ DNNs as a more advanced model of the
the output vector O(x) respectively. The computation of each classical FFN with each hidden layer using the non-linear
hidden layer hi is mathematically defined as activation function, ReLU as it helps to reduce the state
of vanishing and error gradient issue [47]. The advantage
hi (x) = f (wi T x + bi ) (1) of ReLU is that it is faster than other non-linear activation
functions and facilitates training the MLP model with the
where hi : Rdi −1 → Rdi , f : R → R, wi ∈ Rd×di−1 , large number of hidden layers. The hidden layers define the
b ∈ Rdi , di denotes the size of the input, f is the non-linear depth of the neural network and the maximum neurons define
activation function, which is either a sigmoid (values in the the width of the neural network.
range [0, 1]) or a tangent function (values in the range [1, The uniqueness of our method is in modelling the loss
-1]). For the classification problem of Multi-class, our MLP functions and the ReLU to maximize deep learning effi-
model uses sof tmax function as the non-linear activation ciently. These are described in detail.
function. sof tmax function outputs the probabilities of each 1) Loss functions: In modelling an MLP, finding an op-
class and selects the largest value among probability values timal parameter is essential towards achieving good
to give a more accurate value. The mathematical formulae performance. This includes the loss function as an
for sigmoid, tangent and sof tmax activation function are initial step. A loss function is used to calculate the
given below. amount of difference between the predicted and target
1 values. This is defined mathematically as:
sigmoid = (2)
1 + e−x
d(t, p) =k t − p k22 (6)
2x
e −1
tan gent = 2x (3) where t denotes the target value and p denotes the
e +1
predicted value.
exi Multi-class classification uses the negative log prob-
sof tmax(xi ) = Pn (4)
j=1 exj ability with t as the target class and p(pd) as the
probability distributions as represented below:
where x defines an input.
Three-layer MLP with a sof tmax function in output layer
is same as a Multi-class logistic regression model. In general d(t, p(pd)) = − log p(pd)t (7)
terms, for many hidden layers, MLP is formulated as follows:
However, the network receives a list of corrected input-
output set i_o = (i1 , o1 ), (i2 , o2 ), · · · , (in , on ) in the
H(x) = Hl (Hl−1 (Hl−2 (· · · (H1 (x))))) (5)
training process. Then, we aim to decrease the mean of
This way of stacking hidden layers is typically called deep losses as defined below:
neural networks (DNNs). The architecture of deep neural n
network (DNN) as shown in Figure 1 contains l hidden 1X
loss(in , on ) = d(oi , f (ii )) (8)
layers. It takes inputs x = x1 , x2 , · · · , xm−1 , xm and outputs n i=1
7

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

The loss function has to be minimized to get better V. PROBLEM FORMULATION, DATASET LIMITATIONS
results in a neural network. A loss functions is defined AND STATISTICAL MEASURES
as; A. PROBLEM FORMULATION FOR NIDS
Generally, the network traffic data is collected and stored in
n raw TCP dump format. Later, this data can be preprocessed
1 X
and converted into connection records. A connection is sim-
T raini_o (θ) ≡ Li_o (θ) = d(oi , fθ (ii )) (9)
n i=1 ply a sequence of TCP packets starting and ending at well-
defined times with well-defined protocols. Each connection
where θ = (w1 , b1 , · · · , wn , bn ) record includes 100 bytes of information and labelled as
Loss function minimization Li_o (θ) is done by follow- either Normal or as an Attack with exactly one particular
ing a right selection of the value θ ∈ Rd and inherently attack type. Each connection record has a vector and defined
includes the estimation of fθ (pi ) and ∇fθ (ii ) at the as follows
cost |i_o|
CV = (f1 , f2 , · · · , fn , cl) (14)
min L(θ) (10)
θ where f denotes features of length n, values of each f ∈ R
and cl denotes a class label.
Various optimization techniques exist and we adopt
Gradient descent as it is most commonly used. Gra-
B. PROBLEM FORMULATION FOR HIDS
dient descent uses the following rule to calculate and
update parameter repeatedly: In general, all the system events which are the system calls
are collected for each process. Each process p is composed
of sequence of system calls S = sp1 , sp2 , · · · spn where
θnew = θold − α∇θ L(θ) (11) sp ∈ S, sp is a finite set of system calls and S is the set
of system calls used by the host. A sequence of system call
where α denotes learning rate and it is selected based information is used to distinguish the behavior between the
on a hyper parameter selection approach. To find a Normal and Attack categories. The sp along with the label
derivative of L, backpropagation or backward propa- such as Normal or Attack can be used to learn the behaviors
gation of errors algorithm is adopted. Backpropagation of Normal and Attack activities.
uses chain rule to compute θ ∈ Rd with the aim to
minimize the loss function Li_o (θ). However, most of C. DATASET LIMITATIONS
the neural network uses an extension of backpropaga-
Most of the datasets which represents the current network
tion called as stochastic gradient descent (SGD) for
traffic attacks are private due to privacy and security issues.
finding minimum θ. SGD uses a mini batch of training
On the other direction, the datasets which are publicly avail-
samples im_om, in which training samples i_o are
able are laboriously anonymized and suffer from various
chosen randomly instead of using the entire training
issues. In particular they failed to validate that their datasets
set im_om ⊆ i_o. SGD update rule is given as:
typically exhibit the real-world network traffic profile. KDD-
Cup 99 is one of the most commonly used publicly available
θnew = θold − α∇θ J(θ; im(i) , om(i) ) (12) datasets. Although with some known harsh criticisms, it has
been continually used as an effective benchmark dataset for
where im(i) , om(i) denotes input-output pair training many of the research study towards NIDS over the years.
samples. In contrast to critiques of strategy to create dataset, [50]
2) Rectified Linear Unit (ReLU ): Rectified linear unit revealed the detailed analysis of the contents and located
(ReLU ) is found to have a great proficiency and has the non-uniformity and simulated artefacts in the simulated
the tendency to accelerate the training process [47]. network traffic data. They strived to scale the performance
ReLU was the main breakthrough in the neural net- of network anomaly detection between the KDDCup 99
work history for reducing the vanishing and exploding and varied KDDCup 99. They reported that many of the
gradient issue. It’s found as the most efficient method network attributes particularly, remote client address, TTL,
in terms of time and cost for training huge data in com- TCP options and TCP window size are indicated as small and
parison to the classical non-linear activation function limited range in KDDCup 99 datasets but actually exhibit to
such as sigmoid and tangent function [47]. We refer be of large and growing range in real world network traffic
to neurons with this non linearity following [47]. The environment.
mathematical formula for ReLU is defined as follows In [51], discussions indicate why the machine learning
classifiers have limited capability in detecting the attacks that
f (x) = max(0, x) (13) belong to content (’R2L’ and ’U2R’) category in KDDCup
99 dataset. With this dataset, none of the machine learning
where x defines input. classifiers were able to improve the attack detection rate.
8

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

They admitted that the possibility of getting high attack the dataset to be publicly available. Thus, this dataset has
detection rate in most of the cases is by producing new been used as new benchmark for evaluating system call
dataset with a combination of training and testing datasets. based HIDS. The dataset comprises of modern vulnerability
In addition, [52] found that many ’snmpgetattack’ belongs exploits and attacks.
to ’R2L’ category attacks. As a result, in most of the cases,
machine learning classifier poorly performs with this data. D. STATISTICAL MEASURES
DARPA / KDDCup 88 failed to evaluate the classical IDS In evaluation to estimate the various statistical measures the
and it was one of the major of many criticisms. To mitigate ground truth value is required. The ground truth composed of
this [53] used Snort ID system on DARPA / KDDCup 98 tcp- set of connection records labeled either Normal or Attack in
dump traces. The system performed poorly, the accuracy and the case of Binary classification. Let L and A be the number
the false positive rates were impermissible. This is mainly of Normal and Attack connection records in the test dataset,
due to the system failed to detect the attacks belongs to the respectively and the following terms are used for determining
’DoS’ and ’Probe’ category with a fixed signature. In contrast the quality of the classification models:
to this, the detection performance of ’R2L’ and ’U2R’ is • True Positive (T P ) - the number of connection records
much better. correctly classified to the Normal class.
Despite of the harsh criticisms yet, KDDCup 99 is been • True Negative (T N ) - the number of connection records
the most widely used reliable benchmark dataset in most of correctly classified to the Attack class.
the study related to ID system evaluation and other security • False Positive (F P ) - the number of Normal connection
related tasks [55]. To resolve the inherent issues that are records wrongly classified to the Attack connection
exists in KDDCup 99, [55] proposed a most refined version record.
called NSL-KDD. They removed the redundant connection • False Negative (F N ) - the number of Attack connection
records in the entire train and test data and in addition the in- records wrongly classified to the Normal connection
valid records, numbered 136,489 and 136,497 were removed record.
from test data. Thus it protects the classifier not to be biased Based on the aforementioned terms, the following most
in the direction of the more frequent connection records. commonly used evaluation metrics are considered.
NSL-KDD is also not a real world representative of network
1) Accuracy: It estimates the ratio of the correctly rec-
traffic data. Still, this refined version failed to entirely solve
ognized connection records to the entire test dataset.
the issues reported by [57]. To enhance the performances in
If the accuracy is higher, the machine learning model
detecting attacks, 10 more extra features were added with
is better (Accuracy ∈ [0, 1]). accuracy serves as a
14 important features from KDDCup 99 [58]. The Kyoto
good measure for the test dataset that contains balanced
dataset was generated using honeypots. Thus, each flow of
classes and defined as follows
network traffic was done automatically. The normal traffic of
Kyoto dataset was not captured from the real world network TP + TN
traffic. Moreover, the dataset doesn’t contain false positives Accuracy = (15)
TP + TN + FP + FN
that help to minimize the number of alerts to the network
2) Precision: It estimates the ratio of the correctly iden-
admin [58]. In [56] generated a new dataset following two
tified attack connection records to the number of
different profile system in which one system was for gener-
all identified attack connection records. If the Preci-
ating attacks and other one was for normal activities. This
sion is higher, the machine learning model is better
dataset doesn’t contain network traffic of HTTPS protocol.
(P recision ∈ [0, 1]). Precision is defined as follows
Most of the attacks were simulated and failed to preserve
the characteristics of real world statistics. In [57] proposed TP
P recision = (16)
UNSW-NB 15. They adopted the notion of profiles that TP + FP
contains the comprehensive information of intrusions and 3) F1-Score: F1-Score is also called as F1-Measure. It is
applications, protocols, or lower level network entities from the harmonic mean of Precision and Recall. If the F1-
the modern network traffic and detailed information about Score is higher, the machine learning model is better
the network traffic. Recently, to provide benchmark dataset (F 1 − Score ∈ [0, 1]). F1-Score is defined as follows
to the research community, [59] generated reliable dataset.
This meets the real world benign and attacks of network  
Pr ecision × Recall
activities. Moreover, the detailed evaluation of network traffic F 1 − Score = 2 × (17)
features was done by them and detailed experiments towards Pr ecision + Recall
the importance of features to detect various attacks were 4) True Positive Rate (T P R): It is also called as Recall.
done. It estimates the ratio of the correctly classified Attack
Most widely used dataset for HIDS is KDDCup 98, KD- connection records to the total number of Attack con-
DCup 99 and University of New Maxico (UNM). These nection records. If the T P R is higher, the machine
datasets were compiled decades ago and most of them are learning model is better (T P R ∈ [0, 1]). T P R is
irrelevant for today’s operating system. Recently, [62] made defined as follows
9

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

• Content features [10-22]: Content features are ex-


TP
TPR = (18) tracted from the full payload of TCP/IP packets
TP + FN rooted on domain knowledge in tcpdump files.
5) False Positive Rate (F P R): It estimates the ratio of The feature analysis of payload has remained as
the Normal connection records flagged as Attacks to research area for the last years. Recently, in [65], a
the total number of Normal connection records. If the deep learning approach was introduced to analyse
F P R is lower, the machine learning model is better the entire payload data instead of following the
(F P R ∈ [0, 1]). F P R is defined as follows feature extraction process. Content features are
mainly used to identify ’R2L’ and ’U2R’ category
FP
FPR = (19) attacks. For example, many failed login attempts
FP + TN is the most prominent feature to indicate the mali-
6) Receiver Operating Characteristics (ROC) curve: cious behavior in the entire payload. Unlike other
ROC is plotted based on the trade-off between the category attacks, ’R2L’ and ’U2R’ category do
T P R on the y axis to F P R on the x axis across differ- not have the prominent sequential patterns due to
ent thresholds. Area Under the ROC Curve (AU C) is events happening in a single connection.
the size of the area under the ROC curve used along • Time-based traffic features [23-41]: Time-based
with ROC as a comparison metric for the machine traffic features are extracted with a specific tem-
learning models. If the AU C is higher, the machine poral window of two seconds. These are grouped
learning model is better. into ’same host’ and ’same service’ based on the
connection characteristics in the past 2 seconds. To
VI. MODELLING THE DATASET handle slow probing attacks, the aforementioned
Due to security and privacy issues, most of the datasets characteristics are recalculated based on a connec-
are not publicly available. Additionally, the data which are tion window of 100 connections to the same host.
publicly available are laboriously anonymized and do not These are typically termed as connection based or
contemplate today’s network traffic variety. Due to these host-based traffic features.
issues, the exemplary dataset is yet to be discerned [64]. The
details of various IDS datasets are discussed in [66], [59]. 2) NSL-KDD: NSL-KDD is the distilled version of KD-
A detailed overview of available datasets between 1998 and DCup 99 intrusion data. The filters are used to remove
2016 is discussed in detail by [66]. We consider the pros and redundant connection records in KDDCup 99 and con-
cons of existing datasets used in NIDS and HIDS and discuss nection records numbered 136,489 and 136,497 are
how our datasets were modelled. removed from the test data. NSL-KDD can protect
machine learning algorithms not to be biased. This can
A. DATASETS USED IN NIDS suits well for misuse detection in compared to the KD-
1) KDDCup 99: KDDCup 99 dataset was built by pro- DCup 99 dataset. This also suffers from representing
cessing tcpdump data of the 1998 DARPA intrusion the real time network traffic profile characteristics. The
detection challenge dataset. The Mining Audit data detailed statistics of NSL-KDD is reported in Table 1.
for automated models for ID (MADMAID) framework 3) UNSW-NB15: The cyber security research team of
was used to extract features from raw tcpdump data. Australian Centre for Cyber Security (ACCS) has in-
The detailed statistics of the dataset is reported in Table troduced a new data called as UNSW-NB15 to re-
1. KDDCup 1998 dataset was created by MIT Lincon solve the issues found in the KDDCup 99 and NSL-
laboratory using 1000’s of UNIX machines and 100’s KDD datasets. This data is generated in a hybrid way,
users accessing those machines. The network traffic containing the normal and attack behaviors of a live
data was captured and stored in tcpdump format for network traffic using IXIA Perfect Storm tool that has
10 weeks. The data of first seven weeks was used a repository of new attacks and common vulnerability
as training dataset and rest used as testing dataset. exposures (CVE), a storehouse containing information
KDDCup 99 dataset is available in two forms. They regarding security vulnerabilities and exposures, which
are full dataset and 10% dataset. The dataset contains are known publicly. Two servers were used in IXIA
41 features and 5 classes (’Normal’, ’DoS’, ’Probe’, traffic generator tool where one server generated the
’R2L’, ’U2R’). These features are grouped into differ- normal activities, whereas the other generated mali-
ent categories as given below: cious activities in the network. Tcpdump tool used for
• Basic features [1-9]: The packet capture (Pcap) capturing network packet traces which took several
files of tcpdump are used to extract the basic fea- hours to compile the whole data of 100 GBs that were
tures from the packet headers, TCP segments, and divided into 1,000 MB pcaps using tcpdump. From
UDP datagram instead of payload. This task was pcap files, the features were extracted using Argus and
carried out using a remodelled network analysis Bro-IDS in Linux Ubuntu 14.0.4. In addition to the
framework, Bro IDS. above methods, depth analysis of each packet was done
10

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

TABLE 1: Training and Testing connection records from KDDCup 99 and NSL-KDD datasets
Data instances - 10 % data
Attack category Description KDDCup 99 NSL-KDD
Train Test Train Test
Normal Normal connection records 97,278 60,593 67,343 9,710
DoS Attacker aims at making network resources down 391,458 229,853 45,927 7,458
Obtaining detailed statistics of system and network
Probe 4,107 4,166 11,656 2,422
configuration details
R2L Illegal access from remote computer 1,126 16,189 995 2,887
Obtaining the root or super-user access
U2R 52 228 52 67
on a particular computer
Total 494,021 311,029 125,973 22,544

TABLE 2: Training and testing connection records of partial dataset of UNSW-NB15


Class Description Train Test
Normal Normal connection records 56,000 37,000
Attacks related to spams, html files penetrations
Fuzzers 18,184 6,062
and port scans
Attacks related to port scan, html file
Analysis 2,000 677
penetrations and spam
Backdoors is a mechanism
Backdoors 1,746 583
used to access a computer by evading the background existing security.
Intruder aims at making network resources down and
DoS 12,264 4,089
consequently, resources are inaccessible to authorized users
The security hole of operating system or the application software
Exploits is understand by an attacker with the aim to exploit 33,393 11,132
vulnerability
Generic Attacks are related to block-cipher 40,000 18,871
A target system is observe by an attacker to gather
Reconnaissance 10,491 3,496
information for vulnerability
A small part of program termed as payload used in
Shell code 1,133 378
exploitation of software
Worms replicate themselves and distributed to other
Worms 130 44
system through the computer network
Total 93,500 28,481

TABLE 3: Training and testing WSN-DS dataset


Class Description Train Test
Normal Normal connection records 238,046 102,020
It is a kind of ’DoS’ attack where an attacker attacks
Blackhole LEACH protocol and during initial time itself 7,033 3,015
they publicize themselves as a CH
It is a kind of ’DoS’ attack where an attacker attacks
Grayhole LEACH protocol and during initial time itself 10,217 4,379
they publicize themselves as a CH for other nodes
Using different ways, an
Flooding 2,318 994
attacker attacks LEACH protocol
Scheduling attack
Scheduling 4,646 1,992
happens during the setup phase of LEACH protocol
Total 262,260 112,400

11

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

TABLE 4: Training and testing CICIDS 2017 dataset


Class Description Train Test
Normal Normal connection records 60,000 20,000
SSH-Patator Secure shell - Representation of brute force attack 5,000 897
FTP-Patator File transfer protocol - Representation of brute force attack 7,000 938
Intruder aims at making network resources down and
DoS 6,000 2,000
consequently, resources are inaccessible to authorized users
Web Attacks are related to web 2,000 180
Hosts are controlled by
Bot 1,500 466
bot owners to perform various tasks such as steal data, send spam and others
Distributed Denial of
DDoS Service (’DDoS’) is an attempt made to make services down using multiple sources. 6,000 2,000
These are achieved using botnet
Port scan is used to find the specific port which is open for a particular service.
PortScan Using this attacker can get information related to sender and receiver’s listening 6,000 2,000
information
Total 93,500 28,481

TABLE 5: Training and testing Kyoto dataset then preprocessed to generate 23 features. This dataset
Class Training Testing was termed as WSN-DS and its detailed statistics is
Normal 2,384,645 1,405,391 reported in Table 3.
Attack 670,037 158,532 6) CICIDS2017: This dataset includes the contemporary
Total 3,054,682 1,563,923 activities of benign and attacks which depicts the
real time network traffic. The main interest is given
towards collecting the real time background traffic
with 12 algorithms which are developed using C#. The during creating this dataset. Using B-profile system,
data is accessible in two forms as follows: benign background traffic was collected. This benign
traffic contains the characteristics of 25 users based on
a) Full connection records consisting of 2 million the HTTP, HTTPS, FTP, SSH, and email protocols.
connection records The network traffic for five days was collected and
b) A partition of full connection records which is dumped with normal activity traffic on one day, and
composed of 82,332 train connection records and attacks injected on other days. The various attacks
175,341 test connection records confined with injected were Brute Force FTP, Brute Force SSH,
10 attacks. The partitioned dataset consists of 42 ’DoS’, Heartbleed, Web Attack, Infiltration, Botnet and
features with their parallel class labels which are ’DDoS’. They also claimed that their dataset covers
Normal and nine different Attacks. The informa- 11 important criteria which were discussed by [66].
tion regarding simulated attacks category and its The detailed information of CICIDS 2017 dataset is
detailed statistics are described in Table 2. reported in Table 4.
4) Kyoto: The honeypot systems of Kyoto university net-
work traffic data have 24 statistical features. Among We have randomly chosen 20,000 connection records in
24 features, 14 features are from KDDCup 99. These NIDS dataset and passed them into t-SNE [63] and their
features are important as they are collected from raw visual representations are given in Figure 2 for KDDCup 99
traffic data of Kyoto university honeypot systems. Ad- and Figure 3 for CICIDS 2017. Almost both the datasets are
ditionally, 10 more features are identified with the non-linearly separable and the connection records of CICIDS
honeypot’s network traffic system. In this work, the 2017 is considered as more complex in comparison with
network logs of the year 2015 are considered. The logs KDDCup 99. Also, CICIDS 2017 dataset is released recently
are preprocessed and divided into training and test- and contain attacks which have occurred recently. Moreover,
ing datasets. The detailed statistics of the connection the CICIDS 2017 dataset has the characteristics of a real time
records are reported in Table 5. network traffic.
5) WSN-DS: It is an IDS dataset developed for wireless
sensor networks (WSN). This composed of four dif- B. DATASETS USED IN HIDS
ferent types of ’DoS’ attacks: Blackhole, Grayhole,
Flooding, and Scheduling. They used Low-energy The windows and Linux operating system are the most well-
adaptive clustering hierarchy (LEACH) protocol to known and most commonly used operating system (OS).
collect data from network Simulator 2 (NS-2) and Most commonly used datasets for host-based intrusion
12

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

collaborative tools respectively. The detailed statistics


of ADFA-LD is reported in Table 6. The attack types
used in ADFA-LD dataset collection is reported in
Table 7. The attack dataset of ADFA-LD is randomly
split into 55% training and 45% testing. The normal
traces of training and validation data are merged and
randomly split for the 55% training and 45% testing.

TABLE 6: ADFA-LD and ADFA-WD datasets


ADFA-LD ADFA-WD
Dataset
Traces System calls Traces System calls
Train 833 308,077 355 13,504,419
Validation 4,372 2,122,085 1827 117,918,735
Attack 746 317,388 5,542 74,202,804
Total 5,951 2,747,550 7,724 205,625,958
FIGURE 2: t-SNE visualization of KDDCup 99

The ADFA-WD comprises of system calls and dy-


namic link library (DLL) for various attacks. This was
collected in Windows XP SP2 host using Procmon
program [62]. The system setup enabled default fire-
wall and Norton AV 2013. The system was open for
file sharing and to run different applications like FTP
server, streaming media server, database server, web
server, PDF server etc. Using Metasploit framework
and other custom approaches 12 different known vul-
nerabilities were exploited for installed applications.
The detailed statistics of the ADFA-WD dataset is
reported in Table 6.

TABLE 7: Types of attacks in ADFA-LD dataset

FIGURE 3: t-SNE visualization of CICIDS 2017 Attack Number of Traces

Adduser 91
1 2 Java-Meterpreter 124
detection are KDDCup 98, KDDCup 99 and UNM . These
Hydra-FTP 162
datasets were compiled decades ago and do not include the
Hydra-SSH 176
attacks of modern computer systems. The issues of these
datasets were discussed in detail by [62] and proposed a new Meterpreter 75
dataset called as ADFA-LD and ADFA-WD. Web-Shell 118

1) ADFA Linux (ADFA-LD)/ADFA Windows(ADFA-


WD): ADFA-LD is a sequence of system calls dataset
which was collected in networks of modern operating VII. EXPERIMENTAL DESIGN
systems [62]. These hosts were connected to a Linux All the experiments were implemented using Python on an
local server. This comprised of several traces of system Ubuntu 14.0.4 LTS. All classical machine learning algo-
calls which were collected under different situations. rithms were implemented using Scikit-learn3 . Deep neural
This impersonated the real time situations. These sys- networks (DNNs) was implemented using GPU enabled Ten-
tem call traces represented system level vulnerabilities sorFlow4 as backend with Keras5 higher level framework.
and attacks. This server permitted many services such The GPU was NVidia GK110BGL Tesla K40 and CPU had
as remote access, web server, database on an operating a configuration (32 GB RAM, 2 TB hard disk, Intel(R)
system Ubuntu 11.04 (Linux kernel 2.6.38). The SSH, Xeon(R) CPU E3-1220 v3 @ 3.10GHz) running over 1 Gbps
FTP and MySQL 14.14 services used their default Ethernet network. To evaluate the performance of DNNs
ports. Apache 2.2.17 and PHP 5.3.5 and TikiWiki 8.1 and various classical machine learning classifiers on various
are installed as web based services and web based
3 https://scikit-learn.org/stable/
1 http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html 4 https://www.tensorflow.org/
2 https://www.cs.unm.edu/ immsec/systemcalls.htm 5 https://keras.io/

13

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

NIDS and HIDS datasets, the following different test cases ’DoS’, DNN network with less number of units was able
were considered. to achieve optimal detection rate. The acceptable detection
1) Classifying the network connection record as either rate for ’Probe’ category of attacks was found with units 512
benign or attack with all features. and 640. A medium sized DNN with hidden units of 896
2) Classifying the network connection record as either performed well for detection of ’R2L’ attacks in comparison
benign or attack and categorizing an attack into its to the other units. A medium sized DNN required 1,024
categories with all features. units to detect the attacks of ’U2R’. Once the number of
3) Classifying the network connection record as either units increased from 1,024, the performance of DNN network
benign or attack and categorizing an attack into its deteriorated. By considering all these test experiments, 1,024
categories with minimal features. was set as the ideal hidden layer units. To achieve a consid-
erable performance, each DNN network topology required
A. FINDING OPTIMAL PARAMETERS IN DNNS varied number of epochs. DNN network with less number of
As DNNs are parametrized, the performance depends on the parameters have achieved good performance till 100 epochs
optimal parameters. The optimal parameter determination for but when it reaches 500 epochs, complex DNN networks
DNNs network parameter and DNNs network topologies was have performed well in comparison to the DNN network with
done only for KDDCup 99 dataset. To identify the ideal less number of parameters.
parameter for the DNNs, a medium sized architecture was In order to find an optimal learning rate, three trials of
used for experiments with a specific hidden units, learning experiments for 500 epochs with learning rate varying in the
rate and activation function. A medium sized DNN contains 3 range [0.01-0.5] were run. These experiments used a medium
layers. One is input layer, second one is hidden layer or fully sized DNN with 1,024 units. Learning rate has a strong
connected layer and third one is output layer. For KDDCup impact on the training speed, thus we had selected the range
99, the input layer contains 41 neurons, hidden layer contains [0.01-0.5]. The peak value for attack detection rate was ob-
128, 256, 384, 512, 640, 768, 896 and 1,024 units and output tained when the learning rate was 0.1. There was a quick de-
layer contains 1 neuron in classifying the connection record crease in attack detection rate when the learning rate was 0.2
as either normal or attack. It contains 5 neurons in classi- and reached to peak accuracy at learning rates of 0.35, 0.45
fying the connection record as either normal or attack and and 0.45 compared to learning rate 0.1. This attack detection
categorizing attack into corresponding attack categories. The rate was intensified by running the experiments till 1,000
connection between the units between input layer and hidden epochs. As we had examined more complex architectures for
layer and hidden layer to output layer are fully connected. this experiment, it showed less performance for epochs less
Initially, the train and test datasets were normalized using than 500, henceforth we decided to use the learning rate 0.1
L2 normalization. Two trials of experiments were run for for the rest of the experimentation process, because learning
hidden units 128, 256, 384, 512, 640, 768, 896 and 1,024 rate greater than 0.1 was found to be time consuming. To
with a medium sized DNN. The experiment was run for each find the optimal learning rate for Multi-class classification of
parameter with appropriate units and for 300 epochs. The KDDCup 99, we run two trials of experiments for learning
DNN with various units have learnt the patterns of normal rate in the range [0.01-0.5]. The experiments with lower
connection records with epochs 200 in comparison to the learning rate 0.01 showed better performance for attacks
those with attacks. To capture the significant features which ’DoS’ and ’Probe’. When we increase the learning rate from
can distinguish the attack connection record by DNN, 200 0.01, the attack detection rate remained same. Experiments
epochs were required. After 200 epochs, the performance of with learning rate 0.1 has showed optimal performance in
normal connection records fluctuated due to overfitting. Each detection of ’R2L’ and ’U2R’ attacks category. Based on the
number of units with DNN took varied number of epochs to observations of performance of detection of attacks in both
attain considerable performance. It was found that the layer Binary and Multi-class classification, the learning rate was
containing 1,024 units had shown highest number of attack set to 0.1
detection rates. When we increased the number of hidden In third trial of experiments, we had also run experiments
units from 1,024 to 2,048, the performance in attack detection with the sigmoid and tanh activation functions for both
rate deteriorated. Hence, we decided to use 1,024 units for the Binary and Multi-class classification. These functions
the rest of the experiments. The medium sized DNN with achieved good attack detection rates than the ReLU for
1,024 units in hidden layer was used for experiments with 100 epochs in Binary classification. When the same set
Multi-class classification of KDDCup 99 using three trials of experiments were run for 500 epochs the experiments
of experiments for each hidden units until 500 epochs as with ReLU activation function performed better than the
the DNN performed well in comparison to the other units. sigmoid and tanh activation functions. In the case of Multi-
A medium sized DNN with less number of units, 128, 256 class classification, the performance of ReLU was good in
and 384 learned the patterns of high frequency attack, ’DoS’. comparison to the sigmoid and tanh activation functions.
The performance in detection of ’DoS’ attack remained same Thus, we decided to set the ReLU activation function for the
for other variants in the number of units of a medium sized rest of the experiments. All the models were trained using
DNN. Due to the large number of connection records of adam optimizer with a batch size of 64 for 500 epochs to
14

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

TABLE 8: Configuration of proposed DNN model


Layers Type Output shape Number of units Activation function Parameters
0-1 fully connected (None, 1,024) 1,024 ReLU 43,008
1-2 Batch Normalization (None, 1,024) 4,096
2-3 Dropout (0.01) (None, 1,024) 0
3-4 fully connected (None, 768) 768 ReLU 7,87,200
4-5 Batch Normalization (None, 768) 3,072
5-6 Dropout (0.01) (None, 768) 0
6-7 fully connected (None, 512) 512 ReLU 3,93,728
7-8 Batch Normalization (None, 512) 2,048
8-9 Dropout (0.01) (None, 512) 0
9-10 fully connected (None, 256) 256 ReLU 1,31,328
10-11 Batch Normalization (None, 256) 1,024
11-12 Dropout (0.01) (None, 256) 0
12-13 fully connected (None, 128) 128 ReLU 32,896
13-14 Batch Normalization (None, 128) 512
14-15 Dropout (0.01) (None, 128) 0
Binary, Multi-class:
KDDCup 99- 1, 5-
NSL-KDD- 1, 5- Sigmoid for Binary and
15-16 fully connected UNSW-NB15- 1, 10- Sof tmax for Multi-class
Kyoto- 1- classification
WSN-DS- 1, 5
CICIDS 2017- 1, 6

monitor validation accuracy. In order to increase the speed of training and to avert
over fitting, we used batch normalization and dropout (0.01)
B. FINDING AN OPTIMAL NETWORK TOPOLOGY OF approach. When we run experiments without dropout, the
DNN models ends up in over fitting. Also, the experiments with
The following network topologies were used to choose the batch normalization achieved better results in comparison to
best network topology for training an IDS model with KDD- the networks without batch normalization [10]. For HIDS,
Cup 99. the best performed DNNs in NIDS as such used. In HIDS.
We had followed hyper parameter tuning methods only in
1) DNN 1 layer conversion of system calls into numeric representation. For
2) DNN 2 layers N-gram, the 3 trials of experiments were run with 1-gram,
3) DNN 3 layers 2-gram, 3-gram and 4-gram with different DNNs network
4) DNN 4 layers topologies. DNNs network topologies with 3-gram system
5) DNN 5 layers call representation performed well in comparison to the other
For all the above network topologies, we had run 3 trials N-gram system call representation.
of experimentation for 300 epochs each. We observed that
that most of the deep learning architectures learnt the Normal
category patterns of input data for epochs less than 400, C. PROPOSED DNN ARCHITECTURE
while the number of epochs required for discovering Attack This work proposes a unique DNN architecture for NIDS and
category was fluctuating. Both the DNN 1 layer and DNN HIDS composed of an input layer, 5 hidden layers and an
2 layers networks have completely failed to learn the attack output layer. The hierarchical layers in the DNN facilitate to
categories of ’R2L’ and ’U2R’. The performance of ’DoS’ extract highly complex features and do better pattern recogni-
and ’Probe’ attack categories was good with DNN 3 layers in tion capabilities in IDS data. Each layer estimates non-linear
comparison to the DNN 2 layers and DNN 1 layer. The com- features that are passed to the next layer and the last layer in
plex network architectures required a large number of epochs the DNN performs the classification. An input layer contains
in order to reach the optimum accuracy. The performance 41 neurons for KDDCup 99, 41 neurons for NSL-KDD,
of DNN 5 layers for various Attack categories and Normal 43 neurons for UNSW-NB15, 17 neurons for WSN-DS and
category was good as compared to other DNNs network 77 neurons for CICIDS 2017. An output layer contains 1
topologies. By considering all these factors, we’ve decided to neuron for Binary classification for all types of datasets
use 5 layer DNNs network for the remaining experimentation and 5 neurons for Multi-class classification in KDDCup 99,
process. 5 neurons for NSL-KDD, 10 neurons for UNSW-NB15, 5
15

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

neurons for WSN-DS and 8 neurons for CICIDS 2017. The network. This is due to high speed networks and attacks are
detailed information and configuration details of the DNN more and occurring rapidly. The legacy intrusion detection
architecture is shown in Table 8. The DNN is trained using methods which are existing in internet struggle to keep a
the backpropogation mechanism [60]. Generally, the units in watch on the networks more efficiently. To address this, based
input to hidden layer and hidden to output layer are fully on [61] our work leverages the distributed computing and dis-
connected. The DNN is composed of various components, tributed machine learning models to propose Scale-Hybrid-
a brief description of each component is given below. IDS-AlertNet (SHIA) framework for an effective identifi-
Fully connected layer: This layer is called as fully con- cation of intrusions and attacks at both the network-level
nected layer since the units in this layer have connection to and host-level. The framework provides a scalable design
every other unit in the succeeding layer. Generally, the fully and acts as a distributed monitoring and reporting system.
connected layers map the data into high dimensions. The The SHIA enhances the computational performance using
output will be more accurate, when the dimension of data the characteristics of distributed computing and distributed
is more. It uses ReLU as the non-linear activation function. machine learning algorithms using a hybrid system placed in
Batch Normalization and Regularization: Dropout different locations in the network. The deployed system is
(0.01) and Batch Normalization [10] was used in between designed to use the computational resource optimally and at
fully connected layers to obviate overfitting and speedup the the same time with low latency in response while monitoring
DNN model training. A dropout removes neurons with their a critical system. The SHIA framework is basically decom-
connections randomly. In our alternative architectures, the posed into two modules described below:
DNNs could easily overfit the training data without regular-
ization even when trained on large number samples. 1) Packet and system call processing module: In this
Classification: The last layer is a fully connected layer module, the networks which are to be monitored are
which uses sigmoid activation function for Binary clas- connected to a single port mirroring switch, which
sification and sof tmax activation function for Multi-class replicates the flow of the entire network traffic of all
classification. The prediction loss function for sigmoid is the switches. The proposed SHIA IDS is required to
defined using Binary cross entropy and the prediction loss monitor a network composed of different subnets, each
for sof tmax is defined using the Categorical cross entropy with n number of different machines. The monitored
as follows: networks are hosts composed of computer machines
The prediction loss for Binary classification is estimated which allow users to communicate and transfer data.
using Binary cross entropy given by, All the traffic generated by the internet were collected,
without considering the internal traffic between net-
N works.
1 X The SHIA framework with the network where one of
loss(pd, ed) = − [edi log pdi + (1 − edi ) log(1 − pdi )]
N i=1 the switches is used as a port mirroring switch and
(20) connected to a traffic collector. This module collects
where pd is a vector of predicted probability for all sam- network traffic data using Netmap packet capturing
ples in testing dataset, ed is a vector of expected class label, tool and stores it in NoSQL database. Feature vec-
values are either 0 or 1. tors are passed into the DNN module, and a copy of
The prediction loss for Multi-class classification is esti- data is passed into NoSQL database. Likewise, the
mated using Categorical cross entropy given by, system calls and configuration files are collected in
X a distributed manner by following the methodology
loss(pd, ed) = − pd(x) log(ed(x)) (21) provided in [62]. These are passed to NoSQL database
x
and followed by the text representation method to map
where ed is true probability distribution, pd is predicted the system calls to feature vectors. A copy of these
probability distribution. We have used adam as an optimizer feature vectors are dumped into NoSQL database and
to minimize the loss of Binary cross entropy and Categorical these feature vectors are again passed into the DNN
cross entropy. module for classification.
2) DNN module: From the experimental analysis, we
VIII. SCALE-HYBRID-IDS-ALERTNET (SHIA) found that the DNN performed well over other algo-
FRAMEWORK rithms in all cases of HIDS and NIDS. Thus, DNN
An IDS has become an indispensable tool for any type of or- is used to model the network activities with the aim
ganization or industry due to the wide growing nature of data to detect the attacks more accurately. DNNs require
and internet. There are many attempts that have been made large volume of network and host-level events to learn
to develop solutions for IDS. These methods used single host the behaviors of legitimate and malicious activities.
for storage and computational resources and the algorithms To make the classifier more generalizable, the network
are not distributed. Using these legacy solutions, it is en- traffic activities can be collected in different time with
tirely difficult to monitor and identify intrusions in today’s different users in an isolated network. Finally, the DNN
16

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

(a) (b) (c)


FIGURE 4: Train accuracy (a) KDDCup 99 and NSL-KDD, (b) UNSW-NB-15 and WSN-DS, (c) Visualization of 100
connection records with their corresponding activation values of the last hidden layer neurons from Kyoto

(a) (b) (c) (d)


FIGURE 5: ROC curves of (a) KDDCup 99-using classical machine learning classifiers, (b) KDDCup 99-using DNNs, (c)
NSL-KDD-using classical machine learning classifiers, (d) NSL-KDD-using DNNs

(a) (b) (c) (d)


FIGURE 6: ROC curves of (a) UNSW-NB 15-using classical machine learning classifiers, (b) UNSW-NB 15-using DNNs, (c)
Kyoto-using classical machine learning classifiers, (d) Kyoto-using DNNs

(a) (b) (c) (d)


FIGURE 7: ROC curves of (a) WSN-DS-using classical machine learning classifiers, (b) WSN-DS-using DNNs, (c) CICIDS
2017-using classical machine learning classifiers, (d) CICIDS 2017-using DNNs

17

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

(a) (b) (c) (d)


FIGURE 8: Sailency map for a randomly chosen connection record from (a) KDDCup 99, (b) CICIDS 2017 and Visualization
of 100 connection records with their corresponding activation values of the last hidden layer neurons from (c) KDDCup 99,
and (d) NSLKDD

module outputs are passed to Front End Broker. This


displays the results to the network admin. The imple- TABLE 9: Test results of DNNs for Binary class classifica-
mentation details regarding how to conduct a compre- tion
hensive experimental analysis will be considered as Architecture Accuracy Precision Recall F-score
one of the significant work directions towards future Binary classification - KDDCup 99
work. DNN 1 layer 0.929 0.999 0.914 0.954
DNN 2 layers 0.929 0.998 0.914 0.954
IX. RESULTS DNN 3 layers 0.930 0.999 0.914 0.955
Publicly available NIDS and HIDS datasets were used to DNN 4 layers 0.930 0.998 0.915 0.955
evaluate the performance of classical machine learning and DNN 5 layers 0.927 0.994 0.915 0.953
DNNs in order to identify a baseline method. These datasets Binary classification - NSL-KDD
were separated into train and test datasets, and normalized DNN 1 layer 0.801 0.692 0.969 0.807
using L2 normalization. Train datasets were used to train DNN 2 layers 0.794 0.685 0.965 0.801
machine learning model and test datasets were used to eval- DNN 3 layers 0.793 0.684 0.967 0.801
uate the trained machine learning models. Train accuracy DNN 4 layers 0.794 0.684 0.967 0.802
of Multi-class using DNN for KDDCup 99, NSL-KDD and DNN 5 layers 0.789 0.680 0.963 0.797
UNSW NB-15, WSN-DS are shown in Figure 4a and 4b Binary classification - UNSW-NB15
respectively. For KDDCup 99 and NSLKDD datasets, most DNN 1 layer 0.784 0.944 0.725 0.820
of the DNN network topologies showed train accuracy in the
DNN 2 layers 0.751 0.979 0.649 0.780
range 95% to 99%. For UNSW-NB15 and WSN-DS, all the
DNN 3 layers 0.763 0.963 0.678 0.795
DNN network topologies showed train accuracy in the range
DNN 4 layers 0.765 0.946 0.695 0.801
65% to 75%. Several observations were extracted including
DNN 5 layers 0.761 0.951 0.684 0.796
ROC curve. The ROC curve for KDDCup 99, NSL-KDD,
Binary classification - WSN-DS
UNSW NB-15, Kyoto, WSN-DS is shown in Figure 5a,
DNN 1 layer 0.992 0.946 0.971 0.959
Figure 5b, Figure 5c, Figure 5d, Figure 6a, Figure 6b, Figure
DNN 2 layers 0.981 0.842 0.973 0.903
6c, Figure 6d, Figure 7a, Figure 7b and Figure 7c, Figure
DNN 3 layers 0.970 0.764 0.974 0.856
7d respectively. In most of the cases, DNN performed well
DNN 4 layers 0.980 0.844 0.967 0.901
in comparison to the classical machine learning classifiers
DNN 5 layers 0.982 0.931 0.871 0.900
with AUC used as the standard metric. This indicates that
Binary classification - CICIDS 2017
the DNN obtained a highest TPR and a lowest FPR and in
DNN 1 layer 0.963 0.908 0.973 0.939
some cases close to 0. The performance obtained in terms of
DNN 2 layers 0.951 0.897 0.942 0.919
FPR is less in comparison to other classical machine learning
DNN 3 layers 0.944 0.854 0.979 0.912
classifiers in all the datasets. The experiments with 3-gram
DNN 4 layers 0.936 0.836 0.976 0.901
representation performed well as compared to 1-gram and 2-
DNN 5 layers 0.931 0.827 0.974 0.894
gram.
Binary classification - Kyoto

A. PERFORMANCE COMPARISONS DNN 1 layer 0.877 0.917 0.950 0.933


DNN 2 layers 0.875 0.915 0.949 0.932
The detailed results for Binary as well as Multi-class classi-
DNN 3 layers 0.877 0.913 0.954 0.933
fication of various classical machine learning classifiers and
DNN 4 layers 0.875 0.921 0.943 0.931
DNNs are reported in Table 9, Table 10 and Table 11, Table
DNN 5 layers 0.885 0.913 0.964 0.938
12 respectively. In terms of accuracy noted that the DT, AB
and RF classifiers performed better than the other classifiers
18

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

TABLE 10: Test results of DNNs for Multi-class classifica-


tion
TABLE 11: Test results of various classical machine learning
Architecture Accuracy Precision Recall F-score
classifiers for binary class classification
Multi-class classification - KDDCup 99
DNN 1 layer 0.926 0.938 0.926 0.922 Algorithm Accuracy Precision Recall F-score
DNN 2 layers 0.926 0.944 0.926 0.920 Binary classification - KDDCup 99
DNN 3 layers 0.935 0.920 0.935 0.925 LR 0.811 0.994 0.769 0.867
DNN 4 layers 0.929 0.911 0.929 0.918 NB 0.877 0.994 0.852 0.918
DNN 5 layers 0.925 0.934 0.925 0.921 KNN 0.925 0.998 0.909 0.952
Multi-class classification - NSL-KDD DT 0.929 0.997 0.915 0.954
DNN 1 layer 0.778 0.780 0.778 0.760 AB 0.925 0.996 0.910 0.951
DNN 2 layers 0.777 0.777 0.777 0.757 RF 0.927 0.999 0.911 0.953
DNN 3 layers 0.781 0.785 0.781 0.764 SVM-rbf 0.877 0.994 0.852 0.918
DNN 4 layers 0.780 0.816 0.780 0.763 Binary classification - NSL-KDD
DNN 5 layers 0.785 0.810 0.785 0.765 LR 0.826 0.915 0.744 0.820
Multi-class classification - UNSW-NB15 NB 0.829 0.865 0.805 0.834
DNN 1 layer 0.645 0.614 0.645 0.586 KNN 0.910 0.926 0.905 0.915
DNN 2 layers 0.660 0.623 0.660 0.596 DT 0.930 0.928 0.943 0.935
DNN 3 layers 0.660 0.622 0.660 0.594 AB 0.934 0.961 0.914 0.937
DNN 4 layers 0.657 0.606 0.657 0.593 RF 0.929 0.946 0.919 0.933
DNN 5 layers 0.651 0.597 0.651 0.585 SVM-rbf 0.837 0.769 0.993 0.867
Multi-class classification - WSN-DS Binary classification - UNSW-NB15
DNN 1 layer 0.980 0.983 0.980 0.980 LR 0.743 0.955 0.653 0.775
DNN 2 layers 0.969 0.969 0.969 0.968 NB 0.773 0.854 0.805 0.829
DNN 3 layers 0.977 0.976 0.977 0.976 KNN 0.810 0.932 0.778 0.848
DNN 4 layers 0.965 0.964 0.965 0.963 DT 0.897 0.982 0.864 0.919
DNN 5 layers 0.964 0.970 0.964 0.966 AB 0.900 0.985 0.866 0.922
Multi-class classification - CICIDS 2017 RF 0.903 0.988 0.867 0.924
DNN 1 layer 0.960 0.969 0.960 0.962 SVM-rbf 0.653 0.998 0.492 0.659
DNN 2 layers 0.959 0.970 0.959 0.962 Binary classification - WSN-DS
DNN 3 layers 0.962 0.972 0.962 0.965 LR 0.970 0.884 0.777 0.827
DNN 4 layers 0.948 0.965 0.948 0.953 NB 0.831 0.324 0.765 0.455
DNN 5 layers 0.956 0.962 0.956 0.957 KNN 0.943 0.699 0.666 0.682
DT 0.991 0.949 0.951 0.950
AB 0.986 0.897 0.964 0.929
RF 0.996 0.993 0.963 0.978
namely LR, NB, KNN and SVM-rbf. Additionally, the per-
SVM-rbf 0.915 0.997 0.083 0.153
formance of DT, AB and RF classifiers remains the same
Binary classification - CICIDS 2017
range across different datasets. However, the performance
LR 0.839 0.685 0.850 0.758
of LR, NB, KNN and SVM-rbf are varied across different
NB 0.313 0.300 0.979 0.459
datasets. This indicates that the DT, AB and RF classifiers
KNN 0.910 0.781 0.968 0.865
are generalizable and can detect new attacks. While in the
DT 0.935 0.839 0.965 0.898
case of multi-class classification the performance of AB is
AB 0.941 0.887 0.918 0.902
less in compared to DT and RF but performed better than
RF 0.940 0.849 0.969 0.905
the LR, NB, KNN and SVM-rbf. This is due to the fact
SVM-rbf 0.799 0.992 0.328 0.493
that both AB and SVM are not directly applicable for multi-
Binary classification - Kyoto
class classification problems. In multi-class, we deal with
LR 0.895 0.899 0.995 0.944
strengthening the classifier in identifying each individual
NB 0.534 0.922 0.526 0.670
attacks. Experiments on KDDCup 99 and NSL-KDD, all the
KNN 0.856 0.932 0.905 0.918
classical machine learning classifiers obtained less T P R for
DT 0.830 0.925 0.883 0.903
both ’R2L’ and ’U2R’ in compared to the other categories
AB 0.889 0.906 0.978 0.940
such as ’DoS’ and ’Probe’. The primary reason is that
RF 0.882 0.910 0.963 0.936
both the categories of attacks contain very less number of
SVM-rbf 0.895 0.899 0.995 0.944
samples in training sets. Thus during training, the classifiers
gives less preference for these attack categories. In terms of
accuracy, the performance of the DNN is clearly superior
19

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

TABLE 12: Test results of various classical machine learning TABLE 13: Test results using minimal feature sets
classifiers for Multi-class classification
Architecture Accuracy
Algorithm Accuracy Precision Recall F-score KDDCup 99
Multi-class classification - KDDCup 99 Features 11 features 8 features 4 features
LR 0.801 0.872 0.801 0.804 DNN 1 layer 0.842 0.901 0.874
NB 0.857 0.843 0.857 0.834 DNN 2 layers 0.898 0.908 0.881
KNN 0.921 0.924 0.921 0.912 DNN 3 layers 0.908 0.924 0.897
DT 0.924 0.934 0.924 0.918 DNN 4 layers 0.924 0.931 0.904
AB 0.260 0.821 0.260 0.183 DNN 5 layers 0.921 0.932 0.927
RF 0.925 0.944 0.925 0.918 NSL-KDD
SVM-rbf 0.895 0.902 0.895 0.890 DNN 1 layer 0.621 0.687 0.634
Multi-class classification - NSL-KDD DNN 2 layers 0.637 0.712 0.641
LR 0.612 0.509 0.612 0.530 DNN 3 layers 0.689 0.741 0.699
NB 0.295 0.207 0.295 0.184 DNN 4 layers 0.684 0.775 0.734
KNN 0.731 0.720 0.731 0.684 DNN 5 layers 0.752 0.781 0.754
DT 0.763 0.767 0.763 0.728
AB 0.621 0.651 0.621 0.594 TABLE 14: Test results of host-based IDS
RF 0.753 0.814 0.753 0.715
Method Accuracy
SVM-rbf 0.702 0.689 0.702 0.656
ADFA-LD
Multi-class classification - UNSW-NB15
N-gram 1-gram 2-gram 3-gram
LR 0.538 0.414 0.538 0.397
DNN 1 layer 0.885 0.887 0.894
NB 0.437 0.579 0.437 0.396
DNN 2 layers 0.886 0.894 0.897
KNN 0.622 0.578 0.622 0.576
DNN 3 layers 0.894 0.899 0.904
DT 0.733 0.721 0.733 0.705
DNN 4 layers 0.891 0.901 0.912
AB 0.608 0.502 0.608 0.526
DNN 5 layers 0.895 0.902 0.917
RF 0.755 0.755 0.755 0.724
DNN 5 layers - Keras embedding 0.921
SVM-rbf 0.581 0.586 0.581 0.496
SVM + tf-idf 0.887
Multi-class classification - WSN-DS
ADFA-WD
LR 0.944 0.945 0.944 0.943
DNN 1 layer 0.741 0.764 0.774
NB 0.817 0.939 0.817 0.862
DNN 2 layers 0.765 0.757 0.785
KNN 0.926 0.929 0.926 0.926
DNN 3 layers 0.784 0.778 0.794
DT 0.989 0.989 0.989 0.989
DNN 4 layers 0.788 0.789 0.801
AB 0.987 0.987 0.987 0.987
DNN 5 layers 0.791 0.795 0.812
RF 0.994 0.994 0.994 0.994
DNN 5 layers - Keras embedding 0.834
SVM-rbf 0.915 0.916 0.915 0.880
SVM+tfidf 0.801
Multi-class classification - CICIDS 2017
LR 0.870 0.889 0.870 0.868
NB 0.250 0.767 0.250 0.188
-0.32 from the top performed method [3]. However, the pro-
KNN 0.909 0.949 0.909 0.922
posed method contains less number of parameters. Thus, the
DT 0.940 0.965 0.940 0.949
proposed method is computationally inexpensive. Moreover,
AB 0.641 0.691 0.641 0.653
the top performed method [3] uses LSTM, primarily LSTM
RF 0.944 0.970 0.944 0.953
has been used on raw data [10]. There may be chance that the
SVM-rbf 0.799 0.757 0.799 0.723
performance of LSTM can degrade when it sees the real-time
datasets or completely unseen samples. The performance
obtained in terms of FPR is less comparatively to other
to that of classical machine learning algorithms, often by a classical machine learning classifiers in all the datasets.
large margin in both Binary and Multi-class classification. False positive occurs when the IDS identifies a connection
Moreover, with DNN network topologies, the performance record as an attack when it is actually a normal traffic. False
in terms of accuracy is closer to each other’s. For Multi-class negative occurs when the IDS fails to interpret a malicious
classification, accuracy, true positive rate (TPR) and false connection record as an attack. In these cases, IDS must be
positive rate (FPR) are estimated for each class. The detailed carefully tuned to ensure that these are kept very low. The
results are shown in Table 15 for KDDCup 99, Table 16 reported results can be further enhanced by carefully follow-
for NSL-KDD, Table 17 for WSN-DS, Table 18 for UNSW- ing a hyper parameter selection method with highly com-
NB15 and Table 19 for CICIDS 2017. The proposed method, plex DNN architecture. In experiments with HIDS, Keras
DNN with 3 layer showed an accuracy of 93.5% that varies embedding performed better than the N-gram and tf-idf text
20

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

TABLE 15: Detailed test results for Multi-class classification- KDDCup 99


Method Normal DoS Probe R2L U2R
TPR FPR Acc TPR FPR Acc TPR FPR Acc TPR FPR Acc TPR FPR Acc
LR 0.978 0.229 0.811 0.798 0.058 0.832 0.001 0.0 0.987 0.014 0.0 1.0 0.063 0.0 0.974
NB 0.517 0.01 0.898 0.99 0.503 0.874 0.0 0.004 0.983 0.514 0.012 0.988 0.001 0.0 0.972
KNN 0.267 0.267 0.642 0.721 0.72 0.617 0.014 0.012 0.975 0.0 0.0 1.0 0.0 0.0 0.972
DT 0.261 0.261 0.646 0.722 0.722 0.617 0.014 0.013 0.974 0.0 0.002 0.998 0.002 0.002 0.971
AB 0.926 0.925 0.24 0.056 0.055 0.266 0.017 0.014 0.973 0.0 0.001 0.999 0.004 0.003 0.969
RF 0.267 0.268 0.642 0.72 0.719 0.616 0.012 0.011 0.976 0.0 0.0 1.0 0.002 0.002 0.971
SVM-rbf 0.298 0.297 0.624 0.693 0.692 0.602 0.009 0.009 0.977 0.0 0.0 1.0 0.0 0.0 0.972
DNN 1 layer 0.994 0.088 0.928 0.939 0.004 0.953 0.732 0.001 0.995 0.243 0.0 1.0 0.155 0.002 0.975
DNN 2 layer 0.995 0.088 0.928 0.942 0.004 0.955 0.764 0.002 0.995 0.243 0.0 1.0 0.089 0.0 0.975
DNN 3 layer 0.981 0.068 0.942 0.962 0.031 0.963 0.706 0.002 0.994 0.0 0.0 1.0 0.0 0.002 0.971
DNN 4 layer 0.946 0.067 0.935 0.961 0.052 0.958 0.765 0.004 0.993 0.0 0.0 1.0 0.0 0.001 0.972
DNN 5 layer 0.991 0.086 0.929 0.939 0.004 0.953 0.752 0.002 0.994 0.043 0.0 1.0 0.136 0.003 0.974

TABLE 16: Detailed test results for Multi-class classification- NSL-KDD


Method Normal DoS Probe R2L U2R
TPR FPR Acc TPR FPR Acc TPR FPR Acc TPR FPR Acc TPR FPR Acc
LR 0.919 0.496 0.682 0.638 0.159 0.77 0.0 0.0 0.899 0.0 0.0 0.997 0.0 0.0 0.879
NB 0.0319 0.085 0.534 0.827 0.861 0.371 0.0 0.008 0.886 0.179 0.069 0.938 0.0 0.016 0.865
KNN 0.977 0.389 0.769 0.731 0.029 0.889 0.590 0.032 0.929 0.119 0.0 0.997 0.0 0.0 0.879
DT 0.975 0.334 0.799 0.788 0.019 0.917 0.605 0.038 0.93 0.419 0.004 0.998 0.077 0.0 0.882
AB 0.631 0.250 0.698 0.861 0.295 0.757 0.336 0.043 0.89 0.438 0.0 0.999 0.168 0.0 0.898
RF 0.976 0.388 0.768 0.758 0.015 0.908 0.648 0.016 0.97 0.522 0.0 0.997 0.049 0.0 0.885
SVM-rbf 0.998 0.507 0.712 0.644 0.0 0.875 0.512 0.0 0.942 0.0 0.0 0.997 0.0 0.0 0.8799
DNN 1 layer 0.973 0.279 0.829 0.777 0.027 0.907 0.61 0.024 0.936 0.433 0.0 0.998 0.241 0.026 0.886
DNN 2 layers 0.971 0.257 0.841 0.764 0.019 0.907 0.703 0.047 0.926 0.224 0.0 0.997 0.199 0.024 0.883
DNN 3 layers 0.973 0.264 0.838 0.772 0.021 0.909 0.636 0.038 0.927 0.239 0.0 0.997 0.26 0.024 0.89
DNN 4 layers 0.974 0.276 0.832 0.764 0.015 0.91 0.663 0.055 0.915 0.672 0.002 0.998 0.242 0.003 0.906
DNN 5 layers 0.974 0.277 0.831 0.795 0.024 0.915 0.634 0.041 0.924 0.269 0.0 0.998 0.229 0.005 0.903

TABLE 17: Detailed test results for Multi-class classification- WSN-DS


Method Normal Blackhole Grayhole Flooding Scheduling
TPR FPR Acc TPR FPR Acc TPR FPR Acc TPR FPR Acc TPR FPR Acc
LR 0.998 0.147 0.934 0.159 0.048 0.844 0.609 0.145 0.807 0.755 0.0 0.988 0.702 0.0 0.973
NB 0.946 0.066 0.946 0.989 0.146 0.87 0.478 0.039 0.875 0.8385 0.017 0.976 0.288 0.0 0.937
KNN 0.992 0.341 0.838 0.485 0.011 0.929 0.387 0.0888 0.809 0.403 0.007 0.969 0.664 0.0 0.96
DT 0.996 0.054 0.97 0.933 0.004 0.98 0.945 0.014 0.981 0.702 0.0 0.986 0.939 0.0 0.996
AB 0.999 0.085 0.964 0.885 0.008 0.977 0.839 0.019 0.957 0.984 0.0 0.999 0.847 0.0 0.98
RF 0.999 0.034 0.98 0.961 0.0 0.991 0.958 0.0 0.986 0.837 0.0 0.998 0.942 0.0 0.997
SVM-rbf 0.999 0.922 0.575 0.0 0.0 0.866 0.142 0.0 0.833 0.014 0.0 0.956 0.075 0. 0.918
DNN 1 layer 0.998 0.027 0.98 0.965 0.078 0.939 0.616 0.007 0.919 0.978 0.005 0.994 0.917 0.0 0.992
DNN 2 layers 0.999 0.091 0.957 0.754 0.073 0.904 0.538 0.046 0.87 0.754 0.0 0.989 0.937 0.0 0.993
DNN 3 layers 0.999 0.107 0.953 0.883 0.037 0.956 0.666 0.026 0.916 0.776 0.0 0.987 0.916 0.0 0.992
DNN 4 layers 0.998 0.145 0.933 0.862 0.071 0.92 0.474 0.033 0.873 0.648 0.0 0.984 0.796 0.0 0.983
DNN 5 layers 0.994 0.047 0.975 0.946 0.069 0.939 0.676 0.019 0.925 0.819 0.0 0.998 0.879 0.0 0.988

21

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

TABLE 18: Detailed test results for Multi-class classification- UNSW-NB15


Method Normal Fuzzers Analysis Backdoors
TPR FPR Acc TPR FPR Acc TPR FPR Acc TPR FPR Acc
LR 0.954 0.376 0.729 0.0 0.0 0.999 0.0 0.0 0.995 0.0 0.0 0.941
NB 0.527 0.025 0.837 0.086 0.005 0.997 0.018 0.0 0.9866 0.465 0.276 0.708
KNN 0.936 0.309 0.769 0.0 0.0 0.999 0.035 0.0 0.993 0.349 0.008 0.952
DT 0.967 0.134 0.899 0.279 0.0 0.999 0.447 0.0 0.994 0.713 0.0 0.976
AB 0.992 0.425 0.707 0.0 0.0 0.999 0.0 0.0 0.993 0.0 0.0 0.941
RF 0.984 0.149 0.896 0.146 0.0 0.999 0.555 0.0 0.995 0.718 0.0 0.979
SVM-rbf 0.998 0.532 0.637 0.0 0.0 0.999 0.017 0.0 0.996 0.309 0.0 0.958
DNN 1 layer 0.897 0.289 0.775 0.0 0.0 0.999 0.0 0.0 0.995 0.335 0.0 0.957
DNN 2 layers 0.947 0.299 0.784 0.0 0.0 0.999 0.0 0.0 0.995 0.3355 0.0 0.958
DNN 3 layers 0.956 0.312 0.777 0.0 0.0 0.999 0.076 0.0 0.993 0.335 0.0 0.957
DNN 4 layers 0.915 0.264 0.797 0.0 0.0 0.999 0.0 0.0 0.995 0.345 0.0 0.957
DNN 5 layers 0.928 0.285 0.789 0.0 0.0 0.999 0.0 0.0 0.995 0.344 0.013 0.951
Algorithm DoS Exploits Generic Reconnaissance
TPR FPR Acc TPR FPR Acc TPR FPR Acc TPR FPR Acc
LR 0.984 0.245 0.806 0.011 0.0 0.897 0.038 0.019 0.807 0.0 0.0 0.928
NB 0.985 0.335 0.737 0.1238 0.013 0.897 0.028 0.0 0.811 0.0 0.0 0.928
KNN 0.977 0.0 0.989 0.088 0.0367 0.878 0.248 0.084 0.7885 0.338 0.053 0.903
DT 0.986 0.048 0.959 0.149 0.024 0.889 0.509 0.029 0.885 0.579 0.088 0.888
AB 0.967 0.0 0.986 0.0 0.0 0.8961 0.269 0.0734 0.805 0.271 0.037 0.914
RF 0.984 0.025 0.983 0.112 0.002 0.9 0.616 0.049 0.887 0.579 0.083 0.897
SVM-rbf 0.926 0.0 0.982 0.0128 0.0 0.895 0.068 0.024 0.806 0.25 0.039 0.916
DNN 1 layer 0.978 0.0 0.999 0.026 0.0 0.895 0.577 0.176 0.776 0.035 0.0 0.926
DNN 2 layers 0.978 0.0 0.997 0.017 0.0 0.898 0.577 0.155 0.797 0.045 0.0 0.925
DNN 3 layers 0.977 0.0 0.993 0.016 0.0 0.894 0.563 0.144 0.803 0.032 0.0 0.929
DNN 4 layers 0.979 0.0 0.992 0.0107 0.0 0.894 0.617 0.180 0.784 0.028 0.0 0.928
DNN 5 layers 0.977 0.0 0.994 0.013 0.0 0.899 0.571 0.166 0.783 0.018 0.008 0.927
Algorithm Shell code Worms
TPR FPR Acc TPR FPR Acc
LR 0.0 0.0 0.99 0.0 0.0 0.988
NB 0.0 0.0 0.99 0.0 0.0 0.988
KNN 0.0 0.0 0.989 0.0115 0.0 0.986
DT 0.062 0.0 0.99 0.0 0.0 0.988
AB 0.0 0.0 0.99 0.0 0.0 0.986
RF 0.039 0.0 0.994 0.0 0.0 0.988
SVM-rbf 0.0 0.0 0.99 0.0 0.0 0.988
DNN 1 layer 0.0 0.0 0.99 0.0 0.0 0.988
DNN 2 layers 0.0 0.0 0.99 0.0 0.0 0.988
DNN 3 layers 0.0 0.0 0.99 0.0 0.0 0.988
DNN 4 layers 0.0 0.0 0.99 0.0 0.0 0.988
DNN 5 layers 0.0 0.0 0.99 0.0 0.0 0.988

representation on both HIDS datasets, as shown in Table 14. Principal Component Analysis (PCA). The low dimensional
Generally, in DNNs, the network connection records are feature representation for KDDCup 99, NSLKDD and Ky-
propagated through more than one hidden layers to learn oto is represented in Figure 8c, Figure 8d and Figure 4c
the optimal features. Each hidden layer aims at mapping respectively. The connection records of ’Normal’, ’DoS’ and
the data into the higher dimension. Each layer facilitates ’Probe’ have appeared completely in a different cluster in
to understand the significant features towards classifying KDDCup 99. This shows that DNN has learnt the patterns
the network connection into either Normal or Attack and which can distinguish the connection records of ’Normal’,
in categorizing the attack into their attack categories. To ’DoS’ and ’Probe’. It has not completely learnt the optimal
understand, visualize and analyse the results, the activation features to distinguish the connection records of ’U2R’ and
values are passed into t-SNE [63]. This converts the high ’R2L’. This is one of the reasons why few connection records
dimensional activation values into low dimensional using of ’U2R’ and ’R2L’ have appeared in ’Probe’ attack cluster.
22

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

TABLE 19: Detailed test results for Multi-class classification- CICIDS 2017
Method Normal SSH-Patator FTP-Patator
TPR FPR Acc TPR FPR Acc TPR FPR Acc
LR 0.905 0.161 0.885 0.499 0.0 0.984 0.488 0.0 0.982
NB 0.0366 0.0 0.322 0.998 0.25 0.757 0.979 0.317 0.6926
KNN 0.884 0.033 0.909 0.979 0.029 0.97 0.945 0.0 0.995
DT 0.928 0.028 0.939 0.941 0.0 0.996 0.972 0.0 0.999
AB 0.693 0.378 0.678 0.480 0.058 0.927 0.984 0.0 0.965
RF 0.935 0.035 0.944 0.951 0.0 0.999 0.973 0.0 0.998
SVMrbf 0.999 0.681 0.798 0.450 0.0 0.988 0.382 0.0 0.979
DNN 1 layer 0.646 0.645 0.559 0.0 0.018 0.959 0.031 0.040 0.928
DNN 2 layers 0.644 0.642 0.558 0.0 0.0 0.969 0.038 0.040 0.928
DNN 3 layers 0.651 0.648 0.561 0.0 0.0 0.963 0.031 0.041 0.927
DNN 4 layers 0.638 0.639 0.555 0.0227 0.0172 0.958 0.031 0.040 0.928
DNN 5 layers 0.668 0.666 0.568 0.0156 0.014 0.958 0.031 0.040 0.928
Algorithm Web Bot DDoS PortScan
TPR FPR Acc TPR FPR Acc TPR FPR Acc TPR FPR Acc
LR 0.806 0.0 0.998 0.0 0.0 0.982 0.934 0.0 0.992 0.994 0.078 0.926
NB 0.882 0.019 0.979 0.995 0.012 0.988 0.908 0.985 0.129 0.065 0.879
KNN 0.813 0.0 0.997 0.995 0.040 0.960 0.979 0.0 0.995 1.0 0.0 0.996
DT 0.868 0.0 0.995 0.998 0.038 0.966 0.998 0.0 0.999 1.0 0.0 0.997
AB 0.0 0.0 0.993 0.0 0.100 0.884 0.0 0.0 0.929 0.995 0.04705 0.955
RF 0.841 0.0 0.996 0.998 0.039 0.961 0.995 0.0 0.999 1.0 0.0 0.998
SVM-rbf 0.781 0.0 0.996 0.0 0.0 0.982 0.0 0.0 0.929 0.993 0.0 0.99
DNN 1 layer 0.013 0.0 0.985 0.028 0.021 0.961 0.095 0.088 0.854 0.093 0.092 0.852
DNN 2 layers 0.013 0.011 0.982 0.033 0.032 0.951 0.095 0.088 0.854 0.090 0.090 0.853
DNN 3 layers 0.020 0.007 0.986 0.041 0.034 0.949 0.0951 0.088 0.854 0.090 0.089 0.854
DNN 4 layers 0.013 0.009 0.984 0.033 0.0334 0.950 0.0964 0.088 0.854 0.096 0.088 0.
DNN 5 layers 0.017 0.009 0.988 0.028 0.028 0.959 0.094 0.086 0.855 0.089 0.089 0.855

This shows that the attacks of ’Probe’, ’U2R’ and ’R2L’ to the NSL-KDD. The NSL-KDD is a refined version of
have common characteristics. For NSL-KDD, the DNN had KDDCup 99 dataset. Thus, the dataset has unique set of
same issue as KDDCup 99. For Kyoto dataset, few attacks train and test connection records. Moreover, the connection
have appeared in clusters of normal connection records. This records of NSL-KDD is highly non-linearly separable in
shows that they have similar characteristics and it requires comparison to the KDDCup 99. Moreover, the performance
additional features to classify it correctly. of both the classical machine learning classifiers and DNNs
To know the importance of each feature and as well as to considerably less in comparison to the KDDCup 99 and NSL-
identify the significant features, the methodology of [69] has KDD. On subset of CICIDS 2017, both the classical machine
been followed. This uses Taylor expansion for the features of learning classifiers and DNNs performed well. Thus, the
penultimate layer and finds the first order partial derivative proposed SHIA architecture in this work can work well in
of the classification results before placing them through real-time. The performance of DNNs can be enhanced by
the sof tmax function. This helps to detect the significant carefully following a hyper parameter techniques for NSL-
features to distinguish the connection record as either Normal KDD and UNSW-NB 15.
or Attack and categorize an Attack into its categories. The The detailed test results of ADFA-LD and ADFA-WD
connection record which belongs to ’R2L’ is shown in Figure datasets are reported in Table 14. N-gram and Keras embed-
8a. The features have similar characteristics as of features ding text representation methods are used to transform sys-
of ’U2R’. For CICIDS 2017, the connection record which tem call into numeric vectors with DNNs. For comparative
belongs to ’DoS’ is shown in Figure 8b. The features have study, tf-idf is used as system call representation method with
similar characteristics between the ’DoS’ and ’DDoS’. This classical machine learning classifier, SVM. The performance
shows that the dataset requires few more additional features of N-gram and Keras embedding is good in compared to
to classify the connection record to ’DoS’ and ’DDoS’ cor- tf-idf. This is due to the fact that both N-gram and Keras
rectly. embedding have the capability to preserve the sequence
Both classical machine learning classifiers and DNNs net- information of the system calls. Moreover, the Keras em-
works have performed well on KDDCup 99 in comparison bedding performed well over N-gram representation method.
23

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

This is due to the fact that it facilitates to capture the relation BGP events in the networks. The execution time of the
among system calls. To choose the best value for N in N- proposed system can be enhanced by adding more nodes to
gram, the experiments are run with 1, 2 and 3 gram. When the existing cluster. In addition, the proposed system does
the N is increased from 3 to 4, the performance reduced. The not give detailed information on the structure and charac-
experiments with 3-gram representation performed well in teristics of the malware. Overall, the performance can be
compared to 1-gram and 2-gram. However, the N-gram and further improved by training complex DNNs architectures
tf-idf representation produces very large matrix and in some on advanced hardware through distributed approach. Due to
cases this type of matrix can be sparse. Thus there may be extensive computational cost associated with complex DNNs
chance that using any classifier it is very difficult to achieve architectures, they were not trained in this research using the
best performance on the sparse representation. An additional benchmark IDS datasets. This will be an important task in
advantgae of Keras embedding is that the weights of the an adversarial environment and is considered as one of the
embedding layer is updated during backpropogation. significant directions for future work.

B. IMPORTANCE OF MINIMAL FEATURE SETS REFERENCES


Feature selection is an important step for intrusion detection. [1] Mukherjee, B., Heberlein, L. T., & Levitt, K. N. (1994). Network intrusion
It is an important step in order to identify the various types of detection. IEEE network, 8(3), 26-41.
attacks more accurately. Without feature selection, there may [2] Larson, D. (2016). Distributed denial of service attacks-holding back the
flood. Network Security, 2016(3), 5-7.
be a possibility in misclassification of attacks and it would [3] Staudemeyer, R. C. (2015). Applying long short-term memory recurrent
take a large time to train a model [70]. The significance of neural networks to intrusion detection. South African Computer Journal,
feature selection method was discussed in detail for intru- 56(1), 136-154.
[4] Venkatraman, S., Alazab, M. "Use of Data Visualisation for
sion detection using NLS-KDD dataset [71]. They reported Zero-Day Malware Detection," Security and Communication
that the feature selection method significantly reduces the Networks, vol. 2018, Article ID 1728303, 13 pages, 2018.
training and testing time and also showed improved intrusion https://doi.org/10.1155/2018/1728303
detection rate. To evaluate the performance of various DNN [5] Mishra, P., Varadharajan, V., Tupakula, U., & Pilli, E. S. (2018). A
detailed investigation and analysis of using machine learning techniques
topologies and static machine learning classifiers, two trials for intrusion detection. IEEE Communications Surveys & Tutorials.
of experiments are run on minimal feature sets on KDDCup [6] Azab, A., Alazab, M. & Aiash, M. (2016) "Machine Learning Based
99 and NSL-KDD [3]. The detailed results are reported in Botnet Identification Traffic" The 15th IEEE International Conference on
Trust, Security and Privacy in Computing and Communications (Trustcom
Table 13. The experiments with 11 and 8 feature sets per- 2016), Tianjin, China, 23-26 August, pp. 1788-1794.
formed well in comparison to the experiments with 4 feature [7] Vinayakumar R. (2019, January 19). vinayakumarr/Intrusion-detection v1
set. Moreover, experiments with 11 feature sets performed (Version v1). Zenodo. http://doi.org/10.5281/zenodo.2544036
[8] Tang, M., Alazab, M., Luo, Y., Donlon, M. (2018) Disclosure of cyber
well in comparison to the 8 feature set. The difference in per- security vulnerabilities: time series modelling, International Journal of
formance between 11 and 8 minimal feature sets is marginal. Electronic Security and Digital Forensics. Vol. 10, No.3, pp 255 - 275.
[9] V. Paxson. Bro: A system for detecting network intruders in real-
X. CONCLUSIONS AND FUTURE WORK time. Computer networks, vol. 31, no. 23, pp. 24352463, 1999. DOI
http://dx.doi. org/10.1016/S1389-1286(99)00112-7
In this paper, we proposed a hybrid intrusion detection alert [10] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature,
system using a highly scalable framework on commodity 521(7553), 436.
hardware server which has the capability to analyze the [11] Xin, Y., Kong, L., Liu, Z., Chen, Y., Li, Y., Zhu, H., ... & Wang, C. (2018).
Machine Learning and Deep Learning Methods for Cybersecurity. IEEE
network and host-level activities. The framework employed Access.
distributed deep learning model with DNNs for handling and [12] Hofmeyr, S. A., Forrest, S., & Somayaji, A. (1998). Intrusion detection
analyzing very large scale data in real time. The DNN model using sequences of system calls. Journal of computer security, 6(3),
151180.
was chosen by comprehensively evaluating their performance
[13] Forrest, S., Hofmeyr, S. A., Somayaji, A., & Longstaff, T. A. (1996,
in comparison to classical machine learning classifiers on May). A sense of self for unix processes. In Security and Privacy, 1996.
various benchmark IDS datasets. In addition, we collected Proceedings., 1996 IEEE Symposium on (pp. 120-128). IEEE.
host-based and network-based features in real-time and em- [14] Hubballi, N., Biswas, S., & Nandi, S. (2011, January). Sequencegram:
n-gram modeling of system calls for program based anomaly detection.
ployed the proposed DNN model for detecting attacks and In Communication Systems and Networks (COMSNETS), 2011 Third
intrusions. In all the cases, we observed that DNNs exceeded International Conference on (pp. 1-10). IEEE.
in performance when compared to the classical machine [15] Hubballi, N. (2012, January). Pairgram: Modeling frequency information
of lookahead pairs for system call based anomaly detection. In Commu-
learning classifiers. Our proposed architecture is able to per- nication Systems and Networks (COMSNETS), 2012 Fourth International
form better than previously implemented classical machine Conference on (pp. 1-10). IEEE.
learning classifiers in both HIDS and NIDS. To the best of [16] Kozushko, H. (2003). Intrusion detection: Host-based and network-based
intrusion detection systems. on September, 11.
our knowledge this is the only framework which has the
[17] W. Lee and S. Stolfo. A framework for constructing features and models
capability to collect network-level and host-level activities for intrusion detection systems. ACM transactions on information and
in a distributed manner using DNNs to detect attack more system security, vol. 3, no. 4, pp. 227261, 2000. DOI http://dx.doi.
accurately. org/10.1145/382912.382914
[18] Ozgur, A., Erdem, H.: A review of KDD99 dataset usage in intrusion
The performance of the proposed framework can be further detection and machine learning between 2010 and 2015. PeerJ PrePrints
enhanced by adding a module for monitoring the DNS and 4 (2016) e1954

24

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

[19] R. Agarwal, and M. V. Joshi, PNrule: A New Framework for Learning [39] Xie, M., Hu, J., Yu, X., & Chang, E. (2014, October). Evaluating hostbased
Classifier Models in Data Mining, Technical Report TR 00-015, Depart- anomaly detection systems: Application of the frequency-based algorithms
ment of Computer Science, University of Minnesota, 2000. to adfa-ld. In International Conference on Network and System Security
[20] H. Kayacik, A.N. Zincir-Heywood, and M.I. Heywood, "Selecting features (pp. 542-549). Springer, Cham.
for intrusion detection: A feature relevance analysis on KDD 99 intrusion [40] Subba, B., Biswas, S., & Karmakar, S. (2017, November). Host based
detection datasets." Proceedings of the third annual conference on privacy, intrusion detection system using frequency analysis of n-gram terms. In
security and trust 2005, PST 2005, DBLP. Region 10 Conference, TENCON 2017- 2017 IEEE (pp. 2006-2011).
[21] Zhang, Jiong, Mohammad Zulkernine, and Anwar Haque. IEEE.
"Randomforests-based network intrusion detection systems." IEEE [41] Haider, W., Creech, G., Xie, Y., & Hu, J. (2016). Windows based data sets
Transactions on Systems, Man, and Cybernetics, Part C (Applications and for evaluation of robustness of host based intrusion detection systems (ids)
Reviews) 38, no. 5 (2008): 649-659. to zero-day and stealth attacks. Future Internet, 8(3), 29.
[22] Huda, S., Abawajy, J., Alazab, M., Abdollalihian, M., Islam, R. & Year- [42] Kim, G., Yi, H., Lee, J., Paek, Y., & Yoon, S. (2016). LSTM-based
wood, J. (2016) "Hybrids of support vector machine wrapper and filter systemcall language modeling and robust ensemble method for designing
based framework for malware detection" Future Generation Computer hostbased intrusion detection systems. arXiv preprint arXiv:1611.01726.
Systems Journal, Elsevier, vol. 55, 2016, pages 376âĂŞ390. [43] Kezunovic, M., Xie, L., & Grijalva, S. (2013, August). The role of big
[23] Alazab, M., Huda, S., Abawajy, J., Islam, R., Yearwood, J. & Venkatra- data in improving power system operation and protection. In Bulk Power
man, S. & Broadhurst, R. (2014) "A Hybrid Wrapper-Filter Approach for System Dynamics and Control-IX Optimization, Security and Control of
Malware Detection", Journal of Networks, Vol. 9, No. 11, Pages 2878- the Emerging Power Grid (IREP), 2013 IREP Symposium (pp. 1-9). IEEE.
2891. [44] Tang, M., Alazab, M., & Luo, Y. (2017) ’Big data’ mining in cybersecurity,
[24] Hu, Weiming, Wei Hu, and Steve Maybank. "Adaboost-based algorithm IEEE Transactions on Big Data.
for network intrusion detection." IEEE Transactions on Systems, Man, and [45] Vinayakumar R., Poornachandran P., Soman K.P. (2018) Scalable Frame-
Cybernetics, Part B (Cybernetics) 38, no. 2 (2008): 577-583. work for Cyber Threat Situational Awareness Based on Domain Name
[25] L. Ertoz, M. Steinbach, and V. Kumar, Finding Clusters of Different Sizes, Systems Data Analysis. In: Roy S., Samui P., Deo R., Ntalampiras S.
Shapes, and Densities in Noisy, High Dimensional Data, Technical Report (eds) Big Data in Engineering Applications. Studies in Big Data, vol 44.
[26] Amor, N. Ben, Salem Benferhat, and Zied Elouedi. "Naive bayesian Springer, Singapore
networks in intrusion detection systems." In Proc. Workshop on Proba- [46] Manning, C. D., Raghavan, P., & Schutze, H. Introduction to Information
bilistic Graphical Models for Classification, 14th European Conference Retrieval Cambridge University Press, 2008. Ch, 20, 405-416.
on Machine Learning (ECML) and the 7th European Conference on [47] Glorot, X., Bordes, A., & Bengio, Y. (2011, June). Deep sparse rectifier
Principles and Practice of Knowledge Discovery in Databases (PKDD), neural networks. In Proceedings of the Fourteenth International Confer-
23rd September, in CavtatDubrovnik, Croatia, p. 11. 2003. ence on Artificial Intelligence and Statistics (pp. 315-323).
[27] Valdes, A., & Skinner, K. (2000, October). Adaptive, model-based mon- [48] Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013, June). Rectifier nonlin-
itoring for cyberattack detection. In International Workshop on Recent earities improve neural network acoustic models. In Proc. ICML (Vol. 30,
Advances in Intrusion Detection (pp. 80-93). Springer Berlin Heidelberg. No. 1).
[49] Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted
[28] D. Yeung and C. Chow. Parzen-window network intrusion detectors. In
boltzmann machines. In Proceedings of the 27th international conference
Proceedings of the 16th international conference on pattern recognition,
on machine learning (ICML-10) (pp. 807-814).
vol. 4, pp.385388. 2002.
[50] M. Mahoney and P. Chan. An analysis of the 1999 DARPA/Lincoln
[29] Li, Wei. "Using genetic algorithm for network intrusion detection." Pro-
Laboratory evaluation data for network anomaly detection. In Recent
ceedings of the United States Department of Energy Cyber Security Group
advances in intrusion detection, vol. 2820 of Lecture notes in computer
1 (2004): 1-8.
science, pp. 220237. Springer Berlin / Heidelberg, 2003.
[30] Didaci, Luca, Giorgio Giacinto, and Fabio Roli. "Ensemble learning for in-
[51] Sabhnani, Maheshkumar, and Gursel Serpen. "Why machine learning
trusion detection in computer networks." In Workshop Machine Learning
algorithms fail in misuse detection on KDD intrusion detection data set."
Methods Applications, Siena, Italy. 2002.
Intelligent Data Analysis 8, no. 4 (2004): 403-415.
[31] Kolias C, Kambourakis G, Maragoudakis M (2011) Swarm [52] Y. Bouzida and F. Cuppens. Neural networks vs. decision trees for intru-
intelligence in intrusion detection: A survey. Computers and sion detection. In IEEE / IST Workshop on Monitoring, Attack Detection
Security 30(8):625642, DOI 10.1016/j.cose.2011.08.009, URL and Mitigation (MonAM), 2006b.
http://dx.doi.org/10.1016/j.cose.2011.08.009 [53] S. Brugger and J. Chow. An assessment of the DARPA IDS evaluation
[32] Yin, C., Zhu, Y., Fei, J., & He, X. (2017). A Deep Learning Approach for dataset using snort. Tech. Rep. CSE-2007-1, Department of Computer
Intrusion Detection Using Recurrent Neural Networks. IEEE Access, 5, Science, University of California, Davis (UCDAVIS), 2005.
21954-21961. [54] J. McHugh. Testing intrusion detection systems:A critique of the
[33] Javaid, A., Niyaz, Q., Sun, W., & Alam, M. (2016, May). A deep learning 1998 and 1999 DARPA intrusion detection system evaluations as
approach for network intrusion detection system. In Proceedings of the 9th performed by Lincoln Laboratory. ACM transactions on informa-
EAI International Conference on Bio-inspired Information and Communi- tion and system security, vol. 3, no. 4, pp. 262294, 2000. DOI
cations Technologies (formerly BIONETICS) (pp. 21-26). ICST (Institute http://dx.doi.org/10.1145/382912.382923.
for Computer Sciences, Social-Informatics and Telecommunications En- [55] Tavallaee, Mahbod, Ebrahim Bagheri, Wei Lu, and Ali-A. Ghorbani. "A
gineering). detailed analysis of the KDD CUP 99 data set." In Proceedings of the
[34] Kim, J., Kim, J., Thu, H. L. T., & Kim, H. (2016, February). Long short Second IEEE Symposium on Computational Intelligence for Security and
term memory recurrent neural network classifier for intrusion detection. Defence Applications 2009. 2009.
In Platform Technology and Service (PlatCon), 2016 International Con- [56] Shiravi, A., Shiravi, H., Tavallaee, M., & Ghorbani, A. A. (2012). Toward
ference on (pp. 1-5). IEEE. developing a systematic approach to generate benchmark datasets for
[35] F.A. Bin Hamid Ali and Yee Yong Len. Development of host based intru- intrusion detection. computers & security, 31(3), 357-374.
sion detection system for log files. In Business, Engineering and Industrial [57] Moustafa, Nour, and Jill Slay. "UNSW-NB15: a comprehensive data set
Applications (ISBEIA), 2011 IEEE Symposium on, pages 281285, Sep. for network intrusion detection systems (UNSW-NB15 network data set)."
2011. Military Communications and Information Systems Conference (MilCIS),
[36] M. Topallar, M.O. Depren, E. Anarim, and K. Ciliz. Host-based intrusion 2015. IEEE, 2015.
detection by monitoring Windows registry accesses. In Signal Processing [58] Song, J., Takakura, H., Okabe, Y., Eto, M., Inoue, D., & Nakao, K. (2011,
and Communications Applications Conference, 2004. Proceedings of the April). Statistical analysis of honeypot data and building of Kyoto 2006+
IEEE 12th, pages 728731, Apr. 2004. dataset for NIDS evaluation. In Proceedings of the First Workshop on
[37] Aghaei, E., & Serpen, G. (2017). Ensemble classifier for misuse detection Building Analysis Datasets and Gathering Experience Returns for Security
using N-gram feature vectors through operating system call traces. Inter- (pp. 29-36). ACM.
national Journal of Hybrid Intelligent Systems, (Preprint), 1-14. [59] Sharafaldin, I., Lashkari, A. H., & Ghorbani, A. A. (2018). Toward
[38] Borisaniya, B., & Patel, D. (2015). Evaluation of modified vector space generating a new intrusion detection dataset and intrusion traffic character-
representation using adfa-ld and adfa-wd datasets. Journal of Information ization. In Proceedings of fourth international conference on information
Security, 6(03), 250. systems security and privacy, ICISSP.

25

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

[60] Leung, H., & Haykin, S. (1991). The complex backpropagation algorithm. R. VINAYAKUMAR is a Ph.D. student in the
IEEE Transactions on Signal Processing, 39(9), 2101-2104. Computational Engineering & Networking, Am-
[61] Alazab, M. (2015) "Profiling and Classifying the Behaviour of Malicious rita School of Engineering, Coimbatore, Amrita
Codes", Journal of Systems and Software, Elsevier, Volume 100, February Vishwa Vidyapeetham, India since July 2015. He
2015, pages 91âĂŞ102. has received BCA from JSS college of Arts, Com-
[62] Creech, G., & Hu, J. (2014). A semantic approach to host-based intrusion merce and Sciences, Ooty road, Mysore in 2011
detection systems using contiguousand discontiguous system call patterns. and MCA from Amrita Vishwa Vidyapeetham,
IEEE Transactions on Computers, 63(4), 807-819.
Mysore in 2014. He has several papers in Machine
[63] L. Van der Maaten and G. Hinton, Visualizing Data Using T-Sne, Journal
Learning applied to Cyber Security. His Ph.D.
of Machine Learning Research, vol. 9, no. 2579-2605, p. 85, 2008.
[64] Nehinbe, J. O. (2011, September). A critical evaluation of datasets for work centers on Application of Machine learning
investigating IDSs and IPSs researches. In Cybernetic Intelligent Systems (some times Deep learning) for Cyber Security and discusses the importance
(CIS), 2011 IEEE 10th International Conference on (pp. 92-97). IEEE. of Natural language processing, Image processing and Big data analytics
[65] Wang, Z. (2015). The applications of deep learning on traffic identification. for Cyber Security. He has participated in several international shared
BlackHat USA. tasks and organized a shared task on detecting malicious domain names
[66] Gharib, A., Sharafaldin, I., Lashkari, A. H., & Ghorbani, A. A. (2016, (DMD 2018) as part of SSCC’18 and ICACCI’18. More details available
December). An Evaluation Framework for Intrusion Detection Dataset. In at https://vinayakumarr.github.io/
Information Science and Security (ICISS), 2016 International Conference
on (pp. 1-6). IEEE.
[67] Ioffe, S., & Szegedy, C. (2015, June). Batch normalization: Accelerating
deep network training by reducing internal covariate shift. In International
Conference on Machine Learning (pp. 448-456).
[68] Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhut-
dinov, R. (2014). Dropout: a simple way to prevent neural networks from
overfitting. Journal of machine learning research, 15(1), 1929-1958.
[69] Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep inside convo- MAMOUN ALAZAB received his PhD degree in
lutional networks: Visualising image classification models and saliency Computer Science from the Federation University
maps. arXiv preprint arXiv:1312.6034. of Australia, School of Science, Information Tech-
[70] Alazab, M., Venkatraman, S., Watters, P., & Alazab, M. (2011, December). nology and Engineering. He is an Associate Pro-
Zero-day malware detection based on supervised learning algorithms of fessor in the College of Engineering, IT and Envi-
API call signatures. In Proceedings of the Ninth Australasian Data Mining
ronment at Charles Darwin University, Australia.
Conference-Volume 121 (pp. 171-182). Australian Computer Society, Inc..
[71] Alazab, A., Hobbs, M., Abawajy, J., & Alazab, M. (2012, October). Using
He is a cyber security researcher and practitioner
feature selection for intrusion detection system. In Communications and with industry and academic experience. Alazab’s
Information Technologies (ISCIT), 2012 International Symposium on (pp. research is multidisciplinary that focuses on cyber
296-301). IEEE. security and digital forensics of computer systems
[72] Kim, T., Kang, B., Rho, M., Sezer, S., & Im, E. G. (2019). A Multimodal including current and emerging issues in the cyber environment like cyber-
Deep Learning Method for Android Malware Detection Using Various physical systems and internet of things with a focus on cybercrime detection
Features. IEEE Transactions on Information Forensics and Security, 14(3), and prevention. He has more than 100 research papers. He delivered many
773-788. invited and keynote speeches, 22 events in 2018 alone. He convened and
[73] Saracino, A., Sgandurra, D., Dini, G., & Martinelli, F. (2018). Madam: chaired more than 50 conferences and workshops. He works closely with
Effective and efficient behavior-based android malware detection and government and industry on many projects. He is an editor on multiple
prevention. IEEE Transactions on Dependable and Secure Computing, editorial boards including Associate Editor of IEEE Access (2017 Impact
15(1), 83-97. Factor 3.5), Editor of the Security and Communication Networks Journal
[74] Naseer, S., Saleem, Y., Khalid, S., Bashir, M. K., Han, J., Iqbal, M. M., (2016 Impact Factor: 1.067) and Book Review Section Editor: Journal of
& Han, K. (2018). Enhanced Network Anomaly Detection Based on Deep Digital Forensics, Security and Law (JDFSL). He is a Senior Member of the
Neural Networks. IEEE Access, 6, 48231-48246.
IEEE.
[75] Sabar, N. R., Yi, X., & Song, A. (2018). A Bi-objective Hyper-Heuristic
Support Vector Machines for Big Data Cyber-Security. IEEE ACCESS, 6.
[76] Wang, W., Sheng, Y., Wang, J., Zeng, X., Ye, X., Huang, Y., & Zhu, M.
(2018). HAST-IDS: learning hierarchical spatial-temporal features using
deep neural networks to improve intrusion detection. IEEE Access, 6,
1792-1806.
[77] Aditham, S., & Ranganathan, N. (2018). A system architecture for the
detection of insider attacks in big data systems. IEEE Transactions on
Dependable and Secure Computing, 15(6), 974-987.
[78] Kurt, M. N., YÄślmaz, Y., & Wang, X. (2019). Real-time detection of K. P. SOMAN has 25 years of research and
hybrid and stealthy cyber-attacks in smart grid. IEEE Transactions on teaching experience at Amrita School of Engi-
Information Forensics and Security, 14(2), 498-513. neering, Coimbatore. He has around 150 publi-
[79] Zhang, T., & Zhu, Q. (2018). Distributed privacy-preserving collaborative cations in national and international journals and
intrusion detection systems for VANETs. IEEE Transactions on Signal and conference proceedings. He has organized a se-
Information Processing over Networks, 4(1), 148-161. ries of workshops and summer schools in Ad-
vanced signal processing using wavelets, Kernel
Methods for pattern classification, Deep learning,
and Big-data Analytics for industry and academia.
He authored books on "Insight into Wavelets",
"Insight into Data mining", "Support Vector Machines and Other Kernel
Methods" and "Signal and Image processing-the sparse way", published
by Prentice Hall, New Delhi, and Elsevier. More details available at
https://nlp.amrita.edu/somankp/

26

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2895334, IEEE Access

PRABAHARAN POORNACHANDRAN is a
professor at Amrita Vishwa Vidyapeetham. He has
more than two decades of experience in Computer
Science and Security areas. His areas of interests
are Malware, Critical Infrastructure security, Com-
plex Binary analysis, AI and Machine Learning.

AMEER AL-NEMRAT is a senior lecturer (As-


sociate Professor) at the School of Architecture,
Computing and Engineering, University of East
London (UEL). He is the leader for the Profes-
sional Doctorate in IS & M.Sc. Information Se-
curity and Computer Forensics programmes. He
is also the founder and director of the Electronic
Evidence Laboratory, UEL, where he is closely
working on cybercrime projects with different UK
law enforcement agencies. He is a research active
in the area of Security; cybercrime and digital forensics where he has been
publishing research papers in peer-reviewed conferences and internationally
reputed journals.

SITALAKSHMI VENKATRAMAN earned her


PhD in Computer Science, with a doctoral thesis
titled "Efficient Parallel Algorithms for Pattern
Recognition", from National Institute of Industrial
Engineering in 1993. Prior to that she was awarded
MSc (Mathematics) in 1985 and MTech (Com-
puter Science) in 1987, both from Indian Institute
of Technology (Madras) and subsequently MEd
from University of Sheffield in 2001.
Sita has more than 30 years of work experience
both in industry and academics - developing turnkey projects for IT industry
and teaching a variety of IT courses for tertiary institutions, in India,
Singapore, New Zealand, and in Australia since 2007. She is currently the
Discipline Leader and Senior Lecturer in Information Technology at Mel-
bourne Polytechnic. She specialises in applying efficient computing models
and data mining techniques for various industry problems and recently
in the e-health, e-security and e-business domains through collaborations
with industry and universities in Australia. She has published seven book
chapters and more than 130 research papers in internationally well-known
refereed journals and conferences. She is a Senior Member of professional
societies and editorial boards of international journals and serves as Program
Committee Member of several international conferences every year.

27

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
View publication stats http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like