0% found this document useful (0 votes)
3 views10 pages

A Deep Learning Approach To Network Intrusion Detection FINAL

This paper introduces a novel deep learning approach for Network Intrusion Detection Systems (NIDSs) that addresses issues of human interaction and detection accuracy. The proposed non-symmetric deep auto-encoder (NDAE) is utilized for unsupervised feature learning, combined with a classification model using stacked NDAEs and Random Forest algorithms, demonstrating improved results on benchmark datasets. The findings suggest that this deep learning model has strong potential for enhancing the effectiveness of modern NIDSs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views10 pages

A Deep Learning Approach To Network Intrusion Detection FINAL

This paper introduces a novel deep learning approach for Network Intrusion Detection Systems (NIDSs) that addresses issues of human interaction and detection accuracy. The proposed non-symmetric deep auto-encoder (NDAE) is utilized for unsupervised feature learning, combined with a classification model using stacked NDAEs and Random Forest algorithms, demonstrating improved results on benchmark datasets. The findings suggest that this deep learning model has strong potential for enhancing the effectiveness of modern NIDSs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, NOVEMBER 2017 1

A Deep Learning Approach to Network Intrusion


Detection
Nathan Shone, Tran Nguyen Ngoc, Vu Dinh Phai, Qi Shi

Abstract—Network Intrusion Detection Systems (NIDSs) play a crucial role in defending computer networks. However, there are
concerns regarding the feasibility and sustainability of current approaches when faced with the demands of modern networks. More
specifically, these concerns relate to the increasing levels of required human interaction and the decreasing levels of detection
accuracy. This paper presents a novel deep learning technique for intrusion detection, which addresses these concerns. We detail our
proposed non-symmetric deep auto-encoder (NDAE) for unsupervised feature learning. Furthermore, we also propose our novel deep
learning classification model constructed using stacked NDAEs. Our proposed classifier has been implemented in GPU-enabled
TensorFlow and evaluated using the benchmark KDD Cup ’99 and NSL-KDD datasets. Promising results have been obtained from our
model thus far, demonstrating improvements over existing approaches and the strong potential for use in modern NIDSs.

Index Terms—deep learning, anomaly detection, auto-encoders, KDD, network security.

1 I NTRODUCTION
protocols and the diversity of data traversing through modern
O NE of the major challenges in network security is the
provision of a robust and effective Network Intrusion
Detection System (NIDS). Despite the significant advances
networks. This is possibly the most significant challenge and
introduces high-levels of difficulty and complexity when
in NIDS technology, the majority of solutions still operate attempting to differentiate between normal and abnormal
using less-capable signature-based techniques, as opposed behaviour. It increases the difficulty in establishing an accu-
to anomaly detection techniques. There are several reasons rate norm and widens the scope for potential exploitation or
for this reluctance to switch, including the high false error zero-day attacks.
rate (and associated costs), difficulty in obtaining reliable In recent years, one of the main focuses within NIDS
training data, longevity of training data and behavioural research has been the application of machine learning and
dynamics of the system. The current situation will reach a shallow learning techniques such as Naive Bayes, Decision
point whereby reliance on such techniques leads to ineffec- Trees and Support Vector Machines (SVM) [1]. By enlarge,
tive and inaccurate detection. The specifics of this challenge the application of these techniques has offered improve-
are to create a widely-accepted anomaly detection technique ments in detection accuracy. However, there are limitations
capable of overcoming limitations induced by the ongoing with these techniques, such as the comparatively high level
changes occurring in modern networks. of human expert interaction required; expert knowledge
We are concerned with three main limitations, which is needed to process data e.g. identifying useful data and
contribute to this network security challenge. The first is patterns. Not only is this a labour intensive and expensive
the drastic growth in the volume of network data, which is set process but it is also error prone [2]. Similarly, a vast
to continue. This growth can be predominantly attributed quantity of training data is required for operation (with
to increasing levels of connectivity, the popularity of the associated time overheads), which can become challenging
Internet of Things and the extensive adoption of cloud- in a heterogeneous and dynamic environment.
based services. Dealing with these volumes requires tech- To address the above limitations, a research area cur-
niques that can analyse data in an increasingly rapid, effi- rently receiving substantial interest across multiple domains
cient and effective manner. The second cause is the in-depth is that of deep learning. This is an advanced subset of ma-
monitoring and granularity required to improve effectiveness chine learning, which can overcome some of the limitations
and accuracy. NIDS analysis needs to be more detailed and of shallow learning. Thus far, initial deep learning research
contextually-aware, which means shifting away from ab- has demonstrated that its superior layer-wise feature learn-
stract and high-level observations. For example, behavioural ing can better or at least match the performance of shallow
changes need to be easily attributable to specific elements of learning techniques [3]. It is capable of facilitating a deeper
a network, e.g. individual users, operating system versions analysis of network data and faster identification of any
or protocols. The final cause is the number of different anomalies.
In this paper, we propose a novel deep learning model to
• N. Shone and Q. Shi are with the Department of Computer Science at enable NIDS operation within modern networks. The model
Liverpool John Moores University, Liverpool, UK. we propose is a combination of deep and shallow learning,
E-mail: {n.shone/q.shi}@ljmu.ac.uk
• T. Ngoc and V. Phai are with Department of Information Security at Le capable of correctly analysing a wide-range of network
Quy Don Technical University, Hanoi, Vietnam. traffic. More specifically, we combine the power of stacking
Email: [email protected], [email protected] our proposed non-symmetric deep auto-encoder (NDAE)
Manuscript submitted 30 June 2017. (deep-learning) and the accuracy and speed of Random
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, NOVEMBER 2017 2

Forest (RF)(shallow learning). We have practically evaluated this comes with various financial, computational and
our model using GPU-enabled TensorFlow and obtained time costs.
promising results from analysing the KDD Cup ’99 and • Diversity - Recent years have seen an increase in
NSL-KDD datasets. We are aware of the limitations of these the number of new or customised protocols being
datasets but they remain widely-used benchmarks amongst utilised in modern networks. This can be partially
similar works, enabling us to draw direct comparisons. attributed to the number of devices with network
This paper offers the following novel contributions: and/or Internet connectivity. As a result, it is be-
coming increasingly difficult to differentiate between
• A new NDAE technique for unsupervised feature
normal and abnormal traffic and/or behaviours.
learning, which unlike typical auto-encoder ap-
• Dynamics - Given the diversity and flexibility of
proaches provides non-symmetric data dimensional-
modern networks, the behaviour is dynamic and
ity reduction. Hence, our technique is able to facili-
difficult to predict. In turn, this leads to difficulty
tate improved classification results when compared
in establishing a reliable behavioural norm. It also
with leading methods such as Deep Belief Networks
raises concerns as to the lifespan of learning models.
(DBNs).
• Low-frequency attacks - These types of attacks have
• A novel classifier model that utilises stacked NDAEs
often thwarted previous anomaly detection tech-
and the RF classification algorithm. By combining
niques, including artificial intelligence approaches.
both deep and shallow learning techniques to exploit
The problem stems from imbalances in the training
their respective strengths and reduce analytical over-
dataset, meaning that NIDS offer weaker detection
heads. We are able to better or at least match results
precision when faced with these types of low fre-
from similar research, whilst significantly reducing
quency attacks.
the training time.
• Adaptability - Modern networks have adopted
The remainder of this paper is structured as follows. many new technologies to reduce their reliance on
Section 2 presents relevant background information. Section static technologies and management styles. There-
3 examines existing research. Section 4 specifies our pro- fore, there is more widespread usage of dynamic
posed solution, which is subsequently evaluated in Section technologies such as containerisation, virtualisation
5. Section 6 discusses our findings from the evaluation. and Software Defined Networks. NIDSs will need to
Finally the paper concludes in Section 7. be able to adapt to the usage of such technologies
and the side effects they bring about.
2 BACKGROUND
2.2 Deep Learning
In this section, we will provide background information
necessary to understand our motivations and the concepts Deep learning is an advanced sub-field of machine learning,
behind the model proposed in this paper. which advances Machine Learning closer to Artificial Intel-
ligence. It facilitates the modelling of complex relationships
and concepts [6] using multiple levels of representation. Su-
2.1 NIDS challenges
pervised and unsupervised learning algorithms are used to
Network monitoring has been used extensively for the construct successively higher levels of abstraction, defined
purposes of security, forensics and anomaly detection. How- using the output features from lower levels [7].
ever, recent advances have created many new obstacles for
NIDSs. Some of the most pertinent issues include: 2.2.1 Auto-encoder
• Volume - The volume of data both stored and pass- A popular technique currently utilised within deep learning
ing through networks continues to increase. It is research is auto-encoders, which is utilised by our proposed
forecast that by 2020, the amount of data in existence solution (detailed in Section 4). An auto-encoder is an unsu-
will top 44ZB [4]. As such, the traffic capacity of pervised neural network-based feature extraction algorithm,
modern networks has drastically increased to facil- which learns the best parameters required to reconstruct its
itate the volume of traffic observed. Many modern output as close to its input as possible. One of it desirable
backbone links are now operating at wirespeeds of characteristics is the capability to provide more a powerful
100Gbps or more. To contextualise this, a 100Gbps and non-linear generalisation than Principle Component
link is capable of handling 148,809,524 packets per Analysis (PCA).
second [5]. Hence, to operate at wirespeed, a NIDS This is achieved by applying backpropagation and set-
would need to be capable of completing the analysis ting the target values to be equal to the inputs. In other
of a packet within 6.72ns. Providing NIDS at such a words, it is trying to learn an approximation to the identity
speed is difficult and ensuring satisfactory levels of function. An auto-encoder typically has an input layer, out-
accuracy, effectiveness and efficiency also presents a put layer (with the same dimension as the input layer) and
significant challenge. a hidden layer. This hidden layer normally has a smaller di-
• Accuracy - To maintain the aforementioned levels of mension than that of the input (known as an undercomplete
accuracy, existing techniques cannot be relied upon. or sparse auto-encoder). An example of an auto-encoder is
Therefore, greater levels of granularity, depth and shown in Fig. 1.
contextual understanding are required to provide Most researchers [8], [9], [10] use auto-encoders as a non-
a more holistic and accurate view. Unfortunately, linear transformation to discover interesting data structures,
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, NOVEMBER 2017 3

by imposing other constraints on the network, and compare high dimensional data to low dimensional data by utilising
the results with those of PCA (linear transformation). These a deep auto-encoder.
methods are based on the encoder-decoder paradigm. The Deep learning can be applied to auto-encoders, whereby
input is first transformed into a typically lower-dimensional the hidden layers are the simple concepts and multiple
space (encoder), and then expanded to reproduce the initial hidden layers are used to provide depth, in a technique
data (decoder). Once a layer is trained, its code is fed to the known as a stacked auto-encoder. This increased depth can
next, to better model highly non-linear dependencies in the reduce computational costs and the amount of required
input. This paradigm focuses on reducing the dimension- training data, as well as yielding greater degrees of accuracy
ality of input data. To achieve this, there is a special layer [6]. The output from each hidden layer is used as the input
- the code layer [9], at the centre of the deep auto-encoder for a progressively higher level. Hence, the first layer of a
structure. This code layer is used as a compressed feature stacked auto-encoder usually learns first-order features in
vector for classification or for combination within a stacked raw input. The second layer usually learns second-order
auto-encoder [8]. features relating to patterns in the appearance of the first-
order features. Subsequent higher layers learn higher-order
Hidden features. An illustrative example of a stacked auto-encoder
Input Output is shown in Fig. 2. Here, the superscript numbers refer to
Layer
the hidden layer identity and the subscript numbers signify
x1 x'1 the dimension for that layer.

x2 x'2 Hidden Layers Hidden Layers


𝑥1
h
(1)
x… ℎ1
x'…
𝑥2 (2)
ℎ1
𝑓 𝑑 (1) (1)
ℎ2 ℎ1
xn
𝑥3 (2)
ℎ2 (2)
ℎ1
(1) (1)
ℎ3 ℎ2
Fig. 1. An example of a single auto-encoder 𝑥4 (2)
ℎ3 (2) Output
ℎ2
(1) (1)
The hidden layer is used to create a lower dimensionality ℎ4 ℎ3
𝑥5 (2) (2)
version of high dimensionality data (known as encoding). ℎ4 ℎ3 Classifier
(1) (1)
By reducing the dimensionality, the auto-encoder is forced ℎ5 ℎ4
to capture the most prominent features of the data distribu- 𝑥6 (2)
ℎ5
tion. In an ideal scenario, the data features generated by the (1)
auto-encoder will provide a better representation of the data ℎ6
𝑥7
points than the raw data itself.
The aim of the auto-encoder is to try and learn the Input Auto-encoder 1 Auto-encoder 2
function shown in equation 1.
Fig. 2. An example of a stacked auto-encoder
hW,b (x) ≈ x (1)
Here, h is a non-linear hypothesis using the parameters
3 E XISTING W ORK
W (weighting) and b (bias), which can fit the given data (x).
Simply, it tries to learn an approximation to the identity Deep learning is garnering significant interest and its appli-
of a function, where x0 is most similar to x. The learning cation is being investigated within many research domains,
process is described as a reconstruction error minimisation such as: healthcare [11], [12]; automotive design [13], [14];
function, as shown in equation 2. manufacturing [15] and law enforcement [16], [17].
There are also several existing works within the domain
L(x, d(f (x))) (2) of NIDS. In this section, we will discuss the most current
notable works.
Here, L is a loss function penalising d(f (x)) for being Dong and Wang undertook a literary and experimental
dissimilar to x, d is a decoding function and f is an encoding comparison between the use of specific traditional NIDS
function. techniques and deep learning methods [1]. The authors
concluded that the deep learning-based methods offered
2.2.2 Stacked auto-encoder improved detection accuracy across a range of sample sizes
Unlike a simple auto-encoder, a deep auto-encoder is com- and traffic anomaly types. The authors also demonstrated
posed of two symmetrical deep-belief networks, which typ- that problems associated with imbalanced datasets can be
ically have four or five shallow layers for encoding, and a overcome by using oversampling for which, they used the
second set of four or five layers for decoding. The work by Synthetic Minority Oversampling Technique (SMOTE).
Hinton and Salacukhudinov [9] has produced promising re- Zhao et al. [2] presented a state-of-the-art survey of deep
sults by implementing a deep learning algorithm to convert learning applications within machine health monitoring.
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, NOVEMBER 2017 4

They experimentally compared conventional machine learn- In addition, there is other relevant work, including the
ing methods against four common deep learning methods DDoS detection system proposed by Niyaz et al. [26]. They
(auto-encoders, Restricted Boltzmann Machine (RBM), Con- propose a deep learning-based DDoS detection system for
volutional Neural Network (CNN) and Recurrent Neural a software defined network (SDN). Evaluation is performed
Network (RNN). Their work concluded that deep learning using custom generated traffic traces. The authors claim to
methods offer better accuracy than conventional methods. have achieved binary classification accuracy of 99.82% and
Our literature review identified several proposed deep 8-class classification accuracy of 95.65%. However, we feel
learning methods specifically for NIDSs. that drawing comparisons with this paper would be unfair
Alrawashdeh and Purdy [18] proposed using a RBM due to the contextual difference of the dataset. Specifically,
with one hidden layer to perform unsupervised feature re- benchmark KDD datasets cover different distinct categories
duction. The weights are passed to another RBM to produce of attack, whereas the dataset used in this paper focuses on
a DBN. The pre-trained weights are passed into a fine tuning subcategories of the same attack.
layer consisting of a Logistic Regression classifier (trained You et al. [16] propose an automatic security auditing
with 10 epochs) with multi-class soft-max. The proposed tool for short messages (SMS). Their method is based upon
solution was evaluated using the KDD Cup ’99 dataset. the RNN model. The authors claimed that their evaluations
The authors claimed a detection rate of 97.90% and a false resulted in an accuracy rate of 92.7%, thus improving exist-
negative rate of 2.47%. This is an improvement over results ing classification methods (e.g. SVM and Naive Bayes).
claimed by authors of similar papers. Wang et al. [27] propose an approach for detecting
The work by Kim et al. [19] aspired to specifically target malicious JavaScript. Their method uses a 3 layer SdA with
advanced persistent threats. They propose a Deep Neural linear regression. It was evaluated against other classifier
Network (DNN) using 100 hidden units, combined with the techniques, showing that it had the highest true positive
Rectified Linear Unit activation function and the ADAM rate but the second best false positive rate.
optimiser. Their approach was implemented on a GPU The work by Hou et al. [3] outlines their commercial
using TensorFlow, and evaluated using the KDD data set. Android malware detection framework, Deep4MalDroid.
The authors claimed an average accuracy rate of 99%, and Their method involves the use of stacked auto-encoders
summarised that both RNN and Long Short-Term Memory with best accuracy resulting from 3 layers. The 10-fold
(LSTM) models are needed for improving future defences. cross validation was used, showing that in comparison to
Javaid et al. [20] propose a deep learning based approach shallow learning, their approach offers improved detection
to building an effective and flexible NIDS. Their method performance.
is referred to as self-taught learning (STL), which com- Lee et al. [28] propose a deep-learning approach to fault
bines a sparse auto-encoder with softmax regression. They monitoring in semiconductor manufacturing. They use a
have implemented their solution and evaluated it against Stacked de-noising Auto-encoder (SdA) approach to pro-
the benchmark NSL-KDD dataset. The authors claim some vide an unsupervised learning solution. A comparison with
promising levels of classification accuracy in both binary conventional methods has demonstrated that throughout
and 5-class classification. Their results show that their 5- different use cases the approach increases accuracy by up to
class classification achieved an average f-score of 75.76%. 14%. in different use cases. They also concluded that among
Potluri and Diedrich [21] propose a method using 41 the SdAs analysed (1-4 layers) those with 4 layers produced
features and their DNN has 3 hidden layers (2 auto-encoders the best results.
and 1 soft-max). The results obtained were mixed, those The findings from our literature review have shown that
focusing on fewer classes were more accurate than those despite the high detection accuracies being achieved, there
with more classes. The authors attributed this to insufficient is still room for improvement. Such weaknesses include the
training data for some classes. reliance on human operators, long training times, inconsis-
Cordero et al. [22] proposed an unsupervised method tent or average accuracy levels and the heavy modification
to learn models of normal network flows. They use RNN, of datasets (e.g. balancing or profiling). The area is still in
auto-encoder and the dropout concepts of deep learning. an infantile stage, with most researchers still experimenting
The exact accuracy of their proposed method evaluated is on combining various algorithms (e.g. training, optimisa-
not fully disclosed. tion, activation and classification) and layering approaches
Similarly, Tang et al. [23] also propose a method to to produce the most accurate and efficient solution for a
monitor network flow data. The paper lacked details about specific dataset. Hence, we believe the model and work pre-
its exact algorithms but does present an evaluation using sented in this paper will be able to make a valid contribution
the NSL-KDD dataset, which the authors claim gave an to the current pool of knowledge.
accuracy of 75.75% using six basic features.
Kang and Kang [24] proposed the use of an unsuper-
vised DBN to train parameters to initialise the DNN, which 4 P ROPOSED M ETHODOLOGY
yielded improved classification results (exact details of the
approach are not clear). Their evaluation shows improved 4.1 Non-symmetric deep auto-encoder
performance in terms of classification errors. Decreasing the reliance on human operators is a crucial
Hodo et al. [25] have produced a comprehensive taxon- requirement for future-proofing NIDSs. Hence, our aim is
omy and survey on notable NIDSs approaches that utilise to devise a technique capable of providing reliable un-
deep and shallow learning. They have also aggregated some supervised feature learning, which can improve upon the
of the most pertinent results from these works. performance and accuracy of existing techniques.
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, NOVEMBER 2017 5

This paper introduces our NDAE, which is an auto- different features. It also has feature extraction capabilities,
encoder featuring non-symmetrical multiple hidden layers. so it is able to refine the model by prioritising the most
Fundamentally, this involves the proposed shift from the descriptive features.
encoder-decoder paradigm (symmetric) and towards utilis- Due to the data that we envisage this model using,
ing just the encoder phase (non-symmetric). The reasoning we have designed the model to handle large and complex
behind this is that given the correct learning structure, datasets (further details on this are provided in 6). Despite
it is be possible to reduce both computational and time the 42 features present in the KDD Cup ’99 and NSL-KDD
overheads, with minimal impact on accuracy and efficiency. datasets being comparatively small, we maintain that it pro-
NDAE can be used as a hierarchical unsupervised feature vides a benchmark indication as to the model’s capability.
extractor that scales well to accommodate high-dimensional However, the classification power of stacked auto-
inputs. It learns non-trivial features using a similar training encoders with a typical soft-max layer is relatively weak
strategy to that of a typical auto-encoder. An illustrated compared to other discriminative models including RF,
example of this is presented in Fig. 3. KNN and SVM. Hence, we have combined the deep learn-
ing power of our stacked NDAEs with a shallow learn-
Hidden Hidden Hidden
ing classifier. For our shallow learning classifer, we have
x1 x'1 x1 x'1
Layer Layer Layer decided upon using Random Forest. Current comparative
research such as that by Choudhury and Bhowal [29], and
x… h1 x'… x… h1 h2 x'… Anbar et al. [30] shows that RF is one of the best algo-
rithms for intrusion detection. These are findings that were
xn x'n xn x'n
replicated by our own initial tests. Additionally, there are
many examples of current intrusion detection research also
utilising RF such as [31] and [32].
Encode Decode Encode Encode Encode
RF is basically an ensemble learning method, the princi-
Typical auto-encoder Non-symmetric deep auto-encoder
ple of which is to group ‘weak learners’ to form a ‘strong
learner’ [33]. In this instance, numerous individual decision
trees (the weak learners) are combined to form a forest.
Fig. 3. Comparison of a typical auto-encoder and a NDAE
RF can be considered as the bagging (records are selected
The proposed NDAE takes an input vector x ∈ Rd and at random with replacement from the original data) of
step-by-step maps it to the latent representations hi ∈ Rdi these un-pruned decision trees, with a random selection of
(here d represents the dimension of the vector) using a features at each split. It boasts advantages such as low levels
deterministic function shown in equation (3) below: of bias, robustness to outliers and overfitting correction, all
of which would be useful in a NIDS scenario.
hi = σ(Wi .hi−1 + bi ); i = 1, n, (3) In our model, we train the RF classifier using the en-
coded representations learned by the stacked NDAEs to
Here, h0 = x, σ is an activation function (in this work classify network traffic into normal data and known attacks.
we use sigmoid function σ(t) = 1/(1 + e−t )) and n is the In deep learning research, the exact structure of a model
number of hidden layers. dictates its success. Currently, researchers are unable to ex-
Unlike a conventional auto-encoder and deep auto- plain what makes a successful deep learning structure. The
encoder, the proposed NDAE does not contain a decoder exact structure of our model has resulted from experimented
and its output vector is calculated by a similar formula to with numerous structural compositions to achieve the best
equation (4) as the latent representation. results. The final structure of our proposed model is shown
in Fig. 4.
y = σ(Wn+1 .hn + bn+1 ) (4)
The estimator of the model θ = (Wi , bi ) can be obtained NDAE 1 NDAE 2
by minimising the square reconstruction error over m train-
ing samples (x(i) , y (i) )m
i=1 , as shown in equation (5).
m Random
X 41 14 28 28 14 28 28
E(θ) = (x(i) − y (i) )2 (5) Forest
i=1 Classifier
Hidden Layer 1

Hidden Layer 2
Hidden Layer 1

Hidden Layer 3

Hidden Layer 3
Hidden Layer 2

4.2 Stacked non-symmetric deep auto-encoders


In this subsection, we detail the novel deep learning clas-
Input

sification model we have created to address the problems


identified with current NIDSs.
Fundamentally, our model is based upon using our
NDAE technique (outlined in Section 4.1) for deep learning. Fig. 4. Stacked NDAE Classification Model
This is achieved by stacking our NDAEs to create a deep
learning hierarchy. Stacking the NDAEs offers a layer-wise As per Fig. 4, our model uses two NDAEs arranged in a
unsupervised representation learning algorithm, which will stack, and is combined with the RF algorithm. Each NDAE
allow our model to learn the complex relationships between has 3 hidden layers, with each hidden layer using the same
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, NOVEMBER 2017 6

number of neurons as that of features (indicated by the 5.1 Datasets


numbering in the diagram). These exact parameters were This paper utilises the KDD Cup ’99 and NSL-KDD bench-
determined by cross-validating numerous combinations (i.e. mark datasets. Both of which have been used extensively
numbers of neurons and hidden layers), until the most in IDS research involving traffic with both normal and
effective was identified. This allows for performance eval- abnormal connections.
uation without the risk of overfitting. For our experiments,
we used the 10-fold cross-validation approach on the NSL- 5.1.1 KDD Cup ’99
KDD dataset using Scikit Learn. The result for our final
model structure was 0.995999 +/- 0.000556, which is a very The KDD Cup ’99 dataset was used in DARPA’s IDS evalu-
promising result. ation program [34]. The data consists of 4 gigabytes-worth
of compressed tcpdump data resulting from 7 weeks of
network traffic. This can be processed into about 5 million
5 E VALUATION & R ESULTS connection records, each with about 100 bytes. It consists of
Similar to most existing deep learning research, our pro- approximately 4,900,000 single connection vectors each of
posed classification model (Section 4.2) was implemented which contains 41 features. These include Basic features (e.g.
using TensorFlow. All of our evaluations were performed protocol type, packet size), Domain knowledge features (e.g.
using GPU-enabled TensorFlow running on a 64-bit Ubuntu number of failed logins) and timed observation features (e.g.
16.04 LTS PC with an Intel Xeon 3.60GHz processor, 16 GB % of connections with SYN errors). Each vector is labelled as
RAM and an NVIDIA GTX 750 GPU. either normal or as an attack (of which there are 22 specific
To perform our evaluations, we have used the KDD Cup attack types, as outlined in Table 1).
’99 and NSL-KDD datasets. Both of these datasets are con- It is common practice to use 10% of the full size dataset,
sidered as benchmarks within NIDS research. Furthermore, as this provides a suitable representation with reduced com-
using these datasets assists in drawing comparisons with putational requirements. This 10% subset is produced and
existing methods and research. disseminated alongside the original dataset. In this paper,
Throughout this section, we will be using the metrics we use the 10% (herein referred to as KDD Cup ’99) subset,
defined below: which contains 494,021 training records and 311,029 testing
• True Positive (TP) - Attack data that is correctly clas- records. The exact composition is shown in Table 1.
sified as an attack. The KDD Cup ’99 dataset needs pre-processing to be suc-
• False Positive (FP) - Normal data that is incorrectly cessfully utilised with our proposed stacked NDAE model.
classified as an attack. This is because our model operates using only numeric
• True Negative (TN) - Normal data that is correctly values but one record in the dataset has a mixture of
classified as normal. numeric and symbolic values, so a data transformation was
• False Negative (FN) - Attack data that is incorrectly needed to convert them. In addition integer values also need
classified as normal. normalisation as they were mixed with floating point values
between 0 and 1, which would make learning difficult.
We will be using the following measures to evaluate the
performance of our proposed solution: 5.1.2 NSL-KDD
TP + TN The newer NSL-KDD dataset, which was produced by
Accuracy = (6)
TP + TN + FP + FN Tavallaee et al. to overcome the inherent problems of the
The accuracy measures the proportion of the total number KDD ’99 data set, which are discussed in [35]. Although,
of correct classifications. this new version of the dataset still suffers from some of
the problems discussed by McHugh in [36] and may not
TP be a perfect representation of existing real networks. Most
Precision = (7)
TP + FP current NIDS research still uses this dataset, so we believe it
The precision measures the number of correct classifications remains an effective benchmark to help researchers compare
penalised by the number of incorrect classifications. different methods.
The NSL-KDD dataset is fundamentally the same struc-
TP ture as the KDD Cup ’99 dataset (i.e. it has 22 attack patterns
Recall = (8)
TP + FN or normal traffic, and fields for 41 features). We will be
The recall measures the number of correct classifications using the whole NSL-KDD dataset for our evaluations, the
penalised by the number of missed entries. composition of which is also shown in Table 1.
In Table 1, some of the attack patterns have been high-
FP lighted. This indicates attack patterns that contain less than
False Alarm = (9)
FP + TN 20 occurrences in the dataset. 20 is the minimum threshold
The false alarm measures the proportion of benign events required for accurate levels of training and evaluation. So,
incorrectly classified as malicious. for this paper these attacks have been omitted.
One of the most prominent techniques currently used
P recision · Recall within deep learning research is DBNs [7], [1], [2]. One
F-score = 2 · (10)
P recision + Recall notable publication on the technique is by Alrawashdeh
The F-score measures the harmonic mean of precision and and Purdy [18], where the authors propose the use of a
recall, which serves as a derived effectiveness measurement. DBN model for NIDSs. Hence, for our evaluation we draw
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, NOVEMBER 2017 7

TABLE 1 TABLE 3
Composition of Datasets KDD ’99 Training Time

10% KDD ’99 NSL-KDD No. Neurons Training Time (s) Time
Category Attack Type Train Test Train Test in Hidden Layers DBN S-NDAE Saving (%)
’back’ 2203 1098 956 359 8 54660 2024 96.30
’land’ 21 9 18 7 14 122460 2381 98.06
’neptune’ 107201 58001 41214 4657 22 204900 2446 98.81
DoS
’pod’ 264 87 201 41
’smurf’ 280790 164091 2646 665
’teardrop’ 979 12 892 12
’ipsweep’ 1247 306 3599 141 5.3 NSL-KDD
’nmap’ 231 84 1493 73 Unfortunately, the paper [18] does not provide evaluations
Probe
’portsweep’ 1040 354 2931 157
’satan’ 1589 1633 3633 735 using the NSL-KDD dataset. Thus we will be using the
’ftp write’ 8 3 8 3 previously-discussed TensorFlow DBN model for compar-
’guess password’ 53 4367 53 1231 isons. To maximise comparability we have undertaken two
’imap’ 12 1 11 1
’multihop’ 7 18 7 18 separate evaluations based on (a) 5-class classification as used
R2L in KDD Cup ’99, and (b) 13-class classification from NSL-KDD
’phf’ 4 2 4 2
’spy’ 2 0 2 0 (this selection is explained in Section 5.1).
’warezclient’ 1020 0 890 0
’warezmaster’ 20 1602 20 944
’loadmodule’ 9 2 9 2 5.3.1 5-Class Classification
’buffer overflow’ 30 22 30 20 By using the same 5 generic class labels as used in the KDD
U2R
’rootkit’ 10 13 10 13
’perl’ 3 2 3 2
Cup ’99 dataset, we can compare the performance of the two
Normal 97278 60593 67343 9711 models between the two datasets. It also aids comparability
Total 494021 292300 125973 18794 against similar works adopting this strategy. The perfor-
mance results are presented in Table 4 and illustrated by the
Receiver Operating Characteristic (ROC) curve in Figure 5.
a direct comparison between the results obtained from our
proposed model and the DBN model. We will also com-
pare the results of our model against those published by
Alrawashdeh and Purdy.

5.2 KDD Cup ’99


In this section, we evaluate the 5-class classification perfor-
mance of our proposed classification model against the DBN
model published in [18], using the KDD Cup ’99 dataset as
outlined in the previous subsection.
The results obtained from the 5-class analysis of the KDD
Cup ’99 dataset by both the DBN model in [18] and our
stacked NDAE model are presented in Table 2. By compar-
ing the results of both models, we can see that overall our Fig. 5. ROC Curve for NSL-KDD 5-Class
stacked NDAE model the effectiveness and accuracy of our
results are better than, or comparable with those achieved From the table, it is evident that our model offers in-
by the model in [18]. However, notable exceptions to this creased accuracy, precision, recall, effectiveness (F-score)
are the “U2R” and “R2L” classes, which will be discussed in and the false alarm rate, when compared to the DBN ap-
Section 6. proach.
Time efficiency is an important consideration for our
model, particularly when applied within a NIDS. Hence, we 5.3.2 13-Class Classification
have measured the training time required by our stacked As discussed previously, our model is designed to work
NDAE model and a DBN model to analyse the KDD ’99 with larger and complex datasets. Therefore, we evaluate
dataset. However, it would not be a fair to draw compar- our model’s classification capabilities on a 13-class dataset.
isons with [18] in this respect, due to differences in the hard- These 13 labels are those with more than the minimum
ware and software used. Therefore, we have implemented 20 entries. The purpose of this analysis is to compare the
a DBN model in TensorFlow, and the results obtained are stability of our model when the number of attack classes in-
presented in Table 3. creases. Therefore, we do not compare these results against
As Table 3 shows, the non-symmetric approach of our another model. The corresponding performance analysis is
model is able to accomplish a significant reduction in presented in Table 5. It is evident when these results are
required training time, offering an average reduction of compared to those in Table 4 (the 5-class performance) that
97.72%. Hence, it is promising that our model can maintain overall it performs better, with the average accuracy increas-
the high levels of accuracy, whilst drastically reducing the ing from 85.42% to 89.22%. One of our initial goals was
required training time. to support the granularity required by modern networks.
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, NOVEMBER 2017 8

TABLE 2
KDD Cup ’99 Performance

Attack No. No. Accuracy (%) Precision (%) Recall (%) F-Score (%) False Alarm (%)
Class Training Attacks DBN S-NDAE DBN S-NDAE DBN S-NDAE DBN S-NDAE DBN S-NDAE
Normal 97278 60593 99.49 99.49 94.51 100.00 99.49 99.49 96.94 99.75 5.49 8.92
DoS 391458 223298 99.65 99.79 98.74 100.00 99.65 99.79 99.19 99.89 1.26 0.04
Probe 4107 2377 14.19 98.74 86.66 100.00 14.19 98.74 24.38 99.36 13.34 10.83
R2L 1126 5993 89.25 9.31 100.00 100.00 89.25 9.31 94.32 17.04 0.00 0.71
U2R 52 39 7.14 0.00 38.46 0.00 7.14 0.00 12.05 0.00 61.54 100.00
Total 494021 292300 97.90 97.85 97.81 99.99 97.91 97.85 97.47 98.15 2.10 2.15

TABLE 4
NSL-KDD 5-class Performance

Attack No. No. Accuracy (%) Precision (%) Recall (%) F-Score (%) False Alarm (%)
Class Training Attacks DBN S-NDAE DBN S-NDAE DBN S-NDAE DBN S-NDAE DBN S-NDAE
DoS 45927 5741 87.96 94.58 100.00 100.00 87.96 94.58 93.60 97.22 8.80 1.07
Normal 67343 9711 95.64 97.73 100.00 100.00 95.64 97.73 97.77 98.85 24.29 20.62
Probe 11656 1106 72.97 94.67 100.00 100.00 72.97 94.67 84.37 97.26 18.40 16.84
R2L 995 2199 0.00 3.82 0.00 100.00 0.00 3.82 0.00 7.36 0.00 3.45
U2R 52 37 0.00 2.70 0.00 100.00 0.00 2.70 0.00 5.26 0.00 50.00
Total 125973 18794 80.58 85.42 88.10 100.00 80.58 85.42 84.08 87.37 19.42 14.58

Therefore, these results are a promising indication that our 6.2 5-Class NSL-KDD Classification
model can perform better when faced with more detailed With regards to the NSL-KDD dataset, we can see from the
and granular datasets. results that throughout all of the measures our model yields
Timeliness is critical in modern NIDS, thus we also eval- superior level of performance in 3 of the 5 classes. Notably,
uate the training time required for the NSL-KDD dataset. the model offered a total accuracy rate of 85.42%, which
The results of this comparison are shown in Table 6. improves upon the DBN model by just under 5%. It also
offered a 4.84% reduction in the false alarm rate. The results
TABLE 6
NSL-KDD Time Comparison
also re-emphasise the point made, that our model doesn’t
handle smaller classes (“R2L” and “U2R”) as well.
No. Neurons Training Time (s) Time Another important factor is that the time required to
in Hidden Layers DBN S-NDAE Saving (%) train our model is drastically reduced, yielding an average
8 1198.08 644.84 46.18 time saving of 78.19% against DBN. This is of critical impor-
14 10984.04 722.54 93.42
tance particularly for application in a NIDS.
22 21731.76 1091.97 94.98

From these results, we can see that through the different 6.3 13-Class NSL-KDD Classification
hidden layer compositions, our model is able to consistently
The results from the 13-Class classification evaluate demon-
reduce the required training time compared with DBN.
strate that our model was able to offer a 3.8% improvement
on its own accuracy simply by using a more granular
6 D ISCUSSION
dataset. This supports our claim that the model is able to
Our evaluations show that our proposed stacked NDAE work more effectively with larger and complex datasets.
model has produced a promising set of results. Furthermore, the larger dataset gives a better insight
into the weakness in our model. As it can be seen from
6.1 5-Class KDD Cup ’99 Classification
the results, there is a direct correlation between the size of
With regards to the KDD Cup ’99 dataset evaluation, the the training datasets for each label and the accuracy/error
results show that our model is able to offer an average rates. This supports our observation that the smaller classes
accuracy of 97.85%. more specifically, the results show that (in this case “back”, “guess password”, “tear drop” and
our accuracy is better than or comparable with the work in “warez client”) yield lower levels of accuracy using our
[18], in 3 out of 5 classes. It is also a significant improvement model.
on other deep learning methods such as [23]. However, it is However, it must also be noted that the larger classes
noted that the results for “R2L” and “U2L” attack classes yielded consistently high rates throughout all of the perfor-
are anomalous. The stacked NDAE model requires greater mance measures.
amounts of data to learn from. Unfortunately, due to the
smaller number of training datum available, the results
achieved are less stable. Despite this, it is evident from the 6.4 Comparison With Related Works
performance analysis that our model can offer improved We have also compared the results from our stacked
precision, recall and F-score, especially for larger classes. NDAE model against the results obtained from similar deep
Furthermore, our model managed to produce these com- learning-based NIDSs.
parable performance results, whilst consistently reducing In [26], the authors claim their 5-class classification of
the required training time by an average of 97.72%. the NSL-KDD dataset produced an f-score of 75.76%. Their
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, NOVEMBER 2017 9

TABLE 5
NSL-KDD 13-class Performance

Label No. Training No. Attack Accuracy (%) Precision (%) Recall (%) F-score (%) False Alarm (%)
’back’ 956 359 36.77 100.00 36.77 53.77 0.00
’buffer overflow’ 30 20 0.00 0.00 0.00 0.00 100.0
’guess password’ 53 1231 0.00 0.00 0.00 0.00 0.00
’ipsweep’ 3599 141 98.58 100.00 98.58 99.29 6.71
’neptune’ 41214 4657 98.05 100.00 98.05 99.01 0.00
’nmap’ 1493 73 100.00 100.00 100.00 100.00 0.00
’normal’ 67343 9711 97.91 100.00 97.91 98.94 14.70
’pod’ 201 41 92.68 100.00 92.68 96.20 28.30
’portsweep’ 2931 157 95.54 100.00 95.54 97.72 44.03
’satan’ 3633 735 82.45 100.00 82.45 90.38 13.92
’smurf’ 2646 665 99.10 100.00 99.10 99.55 0.15
’teardrop’ 892 12 100.00 100.00 100.00 100.00 75.51
’warezclient’ 890 0 0.00 0.00 0.00 0.00 0.00
Total 125881 17802 89.22 92.97 89.22 90.76 10.78

recall and precision results are not listed but the bar charts our model to handle zero-day attacks. We will then look
show them to be around 69% and 83% respectively. Our to expand upon our existing evaluations by utilising real-
model has produced superior results by offering f-score of world backbone network traffic to demonstrate the merits
87.37%, recall of 85.42% and precision of 100.00%. of the extended model.
Tang et al. [23] claim that their Deep Neural Network
(DNN) approach achieved an accuracy of 75.75% when ACKNOWLEDGEMENTS
performing a 5-class classification of the NSL-KDD dataset. The authors would like to thank the Royal Academy of
This is result is lower than our achieved accuracy of 85.42%. Engineering for their support provided through the Newton
Whilst classifying the KDD Cup ’99 dataset, Kim et al. Research Collaboration Programme.
[37] claim they have achieved an accuracy of accuracy of
96.93%. Also Gao et al. [38] claim their deep learning DBN R EFERENCES
model achieved an accuracy of 93.49%. Both of these results [1] B. Dong and X. Wang, “Comparison deep learning method to
are less than the 97.85% accomplished by our model. traditional methods using for network intrusion detection,” in
These comparisons show that our model’s results are 2016 8th IEEE International Conference on Communication Software
very promising when compared to other current deep and Networks (ICCSN). Beijing, China: IEEE, jun 2016, pp. 581–
585.
learning-based methods. [2] R. Zhao, R. Yan, Z. Chen, K. Mao, P. Wang, and R. X. Gao, “Deep
Learning and Its Applications to Machine Health Monitoring: A
Survey,” submitted to IEEE Transactions on Neural Networks and
7 C ONCLUSION & F UTURE W ORK Learning Systems, vol. 14, no. 8, pp. 1–14, dec 2016. [Online].
In this paper, we have discussed the problems faced by Available: http://arxiv.org/abs/1612.07640
existing NIDS techniques. In response to this we have [3] S. Hou, A. Saas, L. Chen, and Y. Ye, “Deep4MalDroid: A Deep
Learning Framework for Android Malware Detection Based on
proposed our novel NDAE method for unsupervised feature Linux Kernel System Call Graphs,” in 2016 IEEE/WIC/ACM Inter-
learning. We have then built upon this by proposing a novel national Conference on Web Intelligence Workshops (WIW). Omaha,
classification model constructed from stacked NDAEs and Nebraska, USA: IEEE, oct 2016, pp. 104–111.
[4] IDC, “Executive Summary: Data Growth, Business Opportunities,
the RF classification algorithm.
and the IT Imperatives — The Digital Universe of Opportunities:
We have implemented our proposed model in Tensor- Rich Data and the Increasing Value of the Internet of
Flow and performed extensive evaluations on its capabil- Things,” IDC, MA, USA, Tech. Rep., 2014. [Online].
ities. For our evaluations we have utilised the benchmark Available: https://www.emc.com/leadership/digital-universe/
2014iview/executive-summary.htm
KDD Cup ’99 and NSL-KDD datasets and achieved very [5] Juniper Networks, “Juniper Networks - How many Packets
promising results. per Second per port are needed to achieve Wire-Speed?”
Our results have demonstrated that our approach offers 2015. [Online]. Available: https://kb.juniper.net/InfoCenter/
high levels of accuracy, precision and recall together with index?page=content{\&}id=KB14737
[6] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT
reduced training time. Most notably, we have compared Press, 2016, http://www.deeplearningbook.org.
our stacked NDAE model against the mainstream DBN [7] L. Deng, “Deep Learning: Methods and Applications,” Foundations
technique. These comparisons have demonstrated that our and Trends in Signal Processing, vol. 7, no. 3-4, pp. 197–387, aug 2014.
[8] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol,
model offers up to a 5% improvement in accuracy and train- “Stacked denoising autoencoders: Learning useful representations
ing time reduction of up to 98.81%. Unlike most previous in a deep network with a local denoising criterion,” Journal of
work, we have evaluated the capabilities of our model based Machine Learning Research, vol. 11, no. Dec, pp. 3371–3408, 2010.
on both benchmark datasets, revealing a consistent level of [9] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimension-
ality of data with neural networks,” science, vol. 313, no. 5786, pp.
classification accuracy. 504–507, 2006.
Although our model has achieved the above promising [10] Y. Wang, H. Yao, and S. Zhao, “Auto-encoder based dimensional-
results, we acknowledge that it is not perfect and there is ity reduction,” Neurocomputing, vol. 184, pp. 232–242, 2016.
[11] Z. Liang, G. Zhang, J. X. Huang, and Q. V. Hu, “Deep learning for
further room for improvement.
healthcare decision making with EMRs,” in 2014 IEEE International
In our future work, the first avenue of exploration for Conference on Bioinformatics and Biomedicine (BIBM), no. Cm. IEEE,
improvement will be to assess and extend the capability of nov 2014, pp. 556–559.
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, NOVEMBER 2017 10

[12] S. P. Shashikumar, A. J. Shah, Q. Li, G. D. Clifford, and S. Ne- no. 1, pp. 1–9, mar 2017. [Online]. Available: http://www.ijfis.
mati, “A deep learning approach to monitoring and detecting org/journal/view.html?doi=10.5391/IJFIS.2017.17.1.1
atrial fibrillation using wearable technology,” in 2017 IEEE EMBS [29] S. Choudhury and A. Bhowal, “Comparative analysis of machine
International Conference on Biomedical & Health Informatics (BHI). learning algorithms along with classifiers for network intrusion
Florida, USA: IEEE, 2017, pp. 141–144. detection,” in 2015 International Conference on Smart Technologies
[13] F. Falcini, G. Lami, and A. M. Costanza, “Deep Learning and Management for Computing, Communication, Controls, Energy and
in Automotive Software,” IEEE Software, vol. 34, no. 3, pp. Materials (ICSTM), May 2015, pp. 89–95.
56–63, may 2017. [Online]. Available: http://ieeexplore.ieee.org/ [30] M. Anbar, R. Abdullah, I. H. Hasbullah, Y. W. Chong, and
document/7927925/ O. E. Elejla, “Comparative performance analysis of classification
[14] A. Luckow, M. Cook, N. Ashcraft, E. Weill, E. Djerekarov, algorithms for intrusion detection system,” in 2016 14th Annual
and B. Vorster, “Deep learning in the automotive industry: Conference on Privacy, Security and Trust (PST), Dec 2016, pp. 282–
Applications and tools,” in 2016 IEEE International Conference on 288.
Big Data (Big Data). IEEE, dec 2016, pp. 3759–3768. [Online]. [31] Y. Chang, W. Li, and Z. Yang, “Network intrusion detection based
Available: http://ieeexplore.ieee.org/document/7841045/ on random forest and support vector machine,” in 2017 IEEE
[15] H. Lee, Y. Kim, and C. O. Kim, “A Deep Learning Model for International Conference on Computational Science and Engineering
Robust Wafer Fault Monitoring With Sensor Measurement Noise,” (CSE) and IEEE International Conference on Embedded and Ubiquitous
IEEE Transactions on Semiconductor Manufacturing, vol. 30, no. 1, pp. Computing (EUC), July 2017, pp. 635–638.
23–31, feb 2017. [32] Y. Y. Aung and M. M. Min, “An analysis of random forest al-
gorithm based network intrusion detection system,” in 2017 18th
[16] L. You, Y. Li, Y. Wang, J. Zhang, and Y. Yang, “A deep learning-
IEEE/ACIS International Conference on Software Engineering, Artificial
based RNNs model for automatic security audit of short mes-
Intelligence, Networking and Parallel/Distributed Computing (SNPD),
sages,” in 2016 16th International Symposium on Communications and
June 2017, pp. 127–132.
Information Technologies (ISCIT). Qingdao, China: IEEE, sep 2016,
[33] L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp.
pp. 225–229.
5–32, 2001.
[17] R. Polishetty, M. Roopaei, and P. Rad, “A Next-Generation Secure [34] S. J. Stolfo, W. Fan, W. Lee, A. Prodromidis, and P. K. Chan, “Cost-
Cloud-Based Deep Learning License Plate Recognition for Smart based modeling for fraud and intrusion detection: Results from
Cities,” in 2016 15th IEEE International Conference on Machine the jam project,” in In Proceedings of the 2000 DARPA Information
Learning and Applications (ICMLA). Anaheim, California, USA: Survivability Conference and Exposition. IEEE, 2000, pp. 130–144.
IEEE, dec 2016, pp. 286–293. [35] M. Tavallaee, E. Bagheri, W. Lu, and A.-A. Ghorbani, “A detailed
[18] K. Alrawashdeh and C. Purdy, “Toward an Online Anomaly analysis of the kdd cup 99 data set,” in Second IEEE Symposium
Intrusion Detection System Based on Deep Learning,” in 2016 15th on Computational Intelligence for Security and Defence Applications.
IEEE International Conference on Machine Learning and Applications IEEE, 2009, pp. 53–58.
(ICMLA). Anaheim, California, USA: IEEE, dec 2016, pp. 195–200. [36] J. McHugh, “Testing intrusion detection systems: a critique of the
[19] Jin Kim, Nara Shin, S. Y. Jo, and Sang Hyun Kim, “Method of 1998 and 1999 darpa intrusion detection system evaluations as
intrusion detection using deep neural network,” in 2017 IEEE performed by lincoln laboratory,” ACM Transactions on Information
International Conference on Big Data and Smart Computing (BigComp). and System Security, vol. 3, no. 4, pp. 262–294, 2000.
Hong Kong, China: IEEE, feb 2017, pp. 313–316. [37] J. Kim, J. Kim, H. L. T. Thu, and H. Kim, “Long short term memory
[20] A. Javaid, Q. Niyaz, W. Sun, and M. Alam, “A deep learning recurrent neural network classifier for intrusion detection,” in 2016
approach for network intrusion detection system,” in Proceedings International Conference on Platform Technology and Service (PlatCon).
of the 9th EAI International Conference on Bio-inspired Information IEEE, Feb 2016, pp. 1–5.
and Communications Technologies, ser. BICT’15. ICST, Brussels, [38] N. Gao, L. Gao, Q. Gao, and H. Wang, “An intrusion detection
Belgium, Belgium: ICST (Institute for Computer Sciences, model based on deep belief networks,” in 2014 Second International
Social-Informatics and Telecommunications Engineering), 2016, Conference on Advanced Cloud and Big Data, Nov 2014, pp. 247–252.
pp. 21–26. [Online]. Available: http://dx.doi.org/10.4108/eai.
3-12-2015.2262516 Nathan Shone is a Lecturer in the Department
[21] S. Potluri and C. Diedrich, “Accelerated deep neural networks for of Computer Science at Liverpool John Moores
enhanced Intrusion Detection System,” in 2016 IEEE 21st Interna- University (LJMU) in the UK. He completed his
tional Conference on Emerging Technologies and Factory Automation PhD at LJMU in Network Security, focusing on
(ETFA), vol. 2016-Novem. Berlin, Germany: IEEE, sep 2016, pp. misbehaviour detection in complex system-of-
1–8. systems. His research interests include anomaly
detection, misbehaviour monitoring, IoT security
[22] C. Garcia Cordero, S. Hauke, M. Muhlhauser, and M. Fischer,
and security monitoring.
“Analyzing flow-based anomaly intrusion detection using Repli-
cator Neural Networks,” in 2016 14th Annual Conference on Privacy, Tran Nguyen Ngoc is the Head of Department
Security and Trust (PST). Auckland, New Zeland: IEEE, dec 2016, for Information Security at Le Quy Don Technical
pp. 317–324. University in Vietnam. He received PhD in Sys-
[23] T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, and tem analysis, control and information processing
M. Ghogho, “Deep learning approach for network intrusion de- from Don State Technical University, Russia. His
tection in software defined networking,” in 2016 International Con- research interests focus on the pattern recogni-
ference on Wireless Networks and Mobile Communications (WINCOM). tion, cyber security and artificial intelligence.
IEEE, Oct 2016, pp. 258–263.
[24] M.-J. Kang and J.-w. Kang, “Intrusion Detection System Using Vu Dinh Phai is a Researcher in the Department
Deep Neural Network for In-Vehicle Network Security,” PLOS for Information Security at Le Quy Don Technical
ONE, vol. 11, no. 6, p. e0155781, jun 2016. University in Vietnam. Since 2013, he has been
involved in various research projects and teach-
[25] E. Hodo, X. J. A. Bellekens, A. Hamilton, C. Tachtatzis, and
ing at LQDU. He received his Masters degree
R. C. Atkinson, “Shallow and deep networks intrusion detection
in Information Systems from LQDU in 2016. His
system: A taxonomy and survey,” CoRR, vol. abs/1701.02145,
research interests include network security, wire-
2017. [Online]. Available: http://arxiv.org/abs/1701.02145
less security and machine learning.
[26] Q. Niyaz, W. Sun, and A. Y. Javaid, “A deep learning
based ddos detection system in software-defined networking Qi Shi is a Professor in Computer Security in
(SDN),” CoRR, vol. abs/1611.07400, 2016. [Online]. Available: the Department of Computer Science at Liv-
http://arxiv.org/abs/1611.07400 erpool John Moores University (LJMU) in the
[27] Y. Wang, W.-d. Cai, and P.-c. Wei, “A deep learning approach for UK. He received his PhD in Computing from
detecting malicious JavaScript code,” Security and Communication the Dalian University of Technology, P.R. China.
Networks, vol. 9, no. 11, pp. 1520–1534, jul 2016. His research interests include security protocol
[28] H.-W. Lee, N.-r. Kim, and J.-h. Lee, “Deep Neural Network deign, ubiquitous computing security, cloud se-
Self-training Based on Unsupervised Learning and Dropout,” The curity, sensor network security, computer foren-
International Journal of Fuzzy Logic and Intelligent Systems, vol. 17, sics and intrusion detection.

You might also like