0% found this document useful (0 votes)

105 views

A Novel Statistical Analysis and Autoencoder Driven (CB)

research paper

Uploaded by

Elanor El

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

105 views

A Novel Statistical Analysis and Autoencoder Driven (CB)

research paper

Uploaded by

Elanor El

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

A Novel Statistical Analysis and Autoencoder Driven Intelligent Intrusion Detection Approach Communicated by Dr.

Nianyin Zeng

Journal Pre-proof

A Novel Statistical Analysis and Autoencoder Driven Intelligent

Intrusion Detection Approach

Cosimo Ieracitano, Ahsan Adeel, Francesco Carlo Morabito,

Amir Hussain

PII: S0925-2312(19)31575-9
DOI: https://doi.org/10.1016/j.neucom.2019.11.016
Reference: NEUCOM 21521

To appear in: Neurocomputing

Received date: 5 March 2019

Revised date: 30 September 2019
Accepted date: 7 November 2019

Please cite this article as: Cosimo Ieracitano, Ahsan Adeel, Francesco Carlo Morabito, Amir Hussain,
A Novel Statistical Analysis and Autoencoder Driven Intelligent Intrusion Detection Approach, Neuro-
computing (2019), doi: https://doi.org/10.1016/j.neucom.2019.11.016

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition
of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of
record. This version will undergo additional copyediting, typesetting and review before it is published
in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal
disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier B.V.

A Novel Statistical Analysis and Autoencoder Driven
Intelligent Intrusion Detection Approach

Cosimo Ieracitanoa,∗, Ahsan Adeelb , Francesco Carlo Morabitoa , Amir

Hussainc
a DICEAM, University Mediterranea of Reggio Calabria, Via Graziella, Feo di Vito, 89060
Reggio Calabria (Italy)
b School of Mathematics and Computer Science, University of Wolverhampton, Edinburgh

EH16 5XW, UK
c School of Computing, Edinburgh Napier University, Edinburgh EH10 5DT, UK

Abstract
In the current digital era, one of the most critical and challenging issue is ensur-
ing cybersecurity in information technology (IT) infrastructures. Indeed, with
the significant improvement of technology, hackers have been developing ever
more complex and dangerous malware attacks that make the intrusion recog-
nition a very difficult task. In this context, the existing traditional analytic
tools are facing severe challenges to detect and mitigate these threats. In this
work, we introduce a statistical analysis and autoencoder (AE) driven intelli-
gent intrusion detection (IDS) system. Specifically, the proposed IDS combines
data analytics, statistical techniques with recent advances in machine learning
theory to extract optimized and more correlated features. The validity of the
proposed IDS is tested using the benchmark NSL-KDD database. Experimental
results show that the designed IDS achieves better classification performance as
compared to deep and conventional shallow machine learning as well as recently
proposed state-of-the-art techniques.
Keywords: Anomaly Detection, Deep Learning, Autoencoder, Optimized
Features Extraction, NSL-KDD database.

1. Introduction

Internet-connected services are rapidly increasing over years. Indeed, it is

estimated that fifty billion devices will be connected to the Internet by 2020
[1]. However, despite the resulting benefits, information and communication
technology systems are continuously exposed to cyber risks. Malware attacks

∗ Corresponding author
Email addresses: [email protected] (Cosimo Ieracitano),
[email protected] (Ahsan Adeel), [email protected] (Francesco Carlo Morabito),
[email protected] (Amir Hussain)

Preprint submitted to Neurocomputing November 13, 2019

have become more complex and difficult to detect and can cause significant
economic and social consequences. Annually billions of dollars are lost due to
breaches of IT services and this value is expected to grow in the next years
[2]. As a result, cybersecurity has become a priority issue in modern society.
To this end, monitoring and analysing network traffic data is of fundamental
importance to detect potential attack patterns. In this context, in order to
develop even more intelligent Intrusion Detection Systems (IDS) able to pre-
vent malicious threats and ensure improved cybersecurity, companies and IT
enterprises wordwilde have been investing on the so called data science. Such
concept includes a set of techniques coming from computing, statistics and in-
formation technology, as Machine Learning (ML). However, the large amount
of heterogeneous big data generated by different sources, make traditional data
analytics and ML approaches ineffective and inadequate in directly handling
such security challenge. Notably, conventional ML techniques suffer from lim-
ited computational complexity and remain deficient to learn complex non-linear
relationship which exists within the big dataset. Hence, in order to overcome the
aforementioned limitations, and consequently enhance the intrusion detection
perfomance, here, we combine, for the first time, traditional data analysis and
statistical techniques with recent advances in ML. Specifically, Deep Learning
(DL, [3]) technology is employed to develop a more sophisticated security IDS.
Indeed, deep algorithms are able to extract high levels of abstraction automat-
ically from input representations through a hierarchical learning process. DL
has achieved interesting results in various real-world research domains (such as
bioengineering [4], [5], [6], [7], [8], sentiment analysis [9], [10], image recognition
[11], [12], [13] [14], saliency detection [15]) and, recently, it has been employed
to develop also advanced intrusion detection systems [16].
In this paper, a statistically analysis and AE driven intelligent IDS is proposed
to identify normal and abnormal network events. The NSL-KDD dataset from
the Canadian Institute for Cybersecurity (updated version of the original KDD
Cup 1999 Data (KDD99) [17]) is used as benchmark to evaluate the proposed
DL driven IDS. Specifically, the framework here introduced includes three main
processing units: 1) data preprocessing module, 2) feature extraction module,
3) classification module. As regards the data preprocessing module, it removes
the outliers, scales the features in the range [0-1] and transforms categorical fea-
tures into numerical values using the one-hot-encoding technique. The feature
extraction module extracts more correlated variables, removing those features
with null values greater than 80%. In classification module, instead, an autoen-
coder based deep classifier is proposed to discriminate different groups of the
NSL-KDD dataset. Such categories include both binary and multi-classification.
The binary classes include: Normal and Abnormal (DoS, R2L, Probe attack cat-
egories), whereas classes for multi-classification include: Normal, DoS, R2L, and
Probe. The performance of the AE classifier is compared with deep (Long Short-
Term Memory, LSTM) and conventional shallow classifiers (Multi Layer Percep-
tron (MLP), Linear-Support Vector Machine (L-SVM), Quadratic-Support Vec-
tor Machine (Q-SVM), Discriminant Analysis with linear (LDA) and quadratic
discrimination function (QDA)) as well as existing deep learning methods. As

2
can be seen in Table 5 and 8 the proposed AE processor outperformed all other
approaches, reporting accuracy value of 84.21% and 87% in binary and multi-
classification, respectively.
The main contributions of this work can be summarize as follow:

• development of innovative IDS based on data analytics and DL technolo-

gies;
• development of a IDS able to effectively discriminate different cyber-attack
classes of the NLS-KDD dataset with very good accuracy;
• development of a IDS with significant potential for exploitation in indus-
trial applications.

The present paper is organized as follows. Section 2 presents the related works,
in particular recently proposed DL approaches based on the NSL-KDD dataset.
Section 2 introduces the used NSL-KDD dataset and explains the proposed
method, including data preprocessing, features extraction and the developed
classifiers. Section 4 reports the experimental results. Finally, Section 5 and 6
conclude this work.

2. Related work

In the literature there are several intrusion detection systems that use the
KDD99 and NSL-KDD datasets to measure the performance and effectiveness
of the proposed models. For example, Ingre et al. [18] developed a 3-layers
MLP to detect attacks classes of the NSL-KDD dataset, achieving an accuracy
of 79.9% for multi-classification and 81.2% for binary classification on test set.
Ibrahim et al. [19] proposed a novel method based on self-organizing maps
(SOMs) for binary classification and reported detection accuracy up to 75.49%
on NSL-KDD test dataset. Similarly, Mohamed et al. [20] applied conven-
tional learning techniques such as MLP and achieved accuracy up to 95.7% for
binary classification. However, the authors partitioned the dataset into k=10
folds. Gao et al. [21] developed a semi-supervised learning method based on
fuzzy and ensemble learning theory. The authors used the NSL-KDD dataset,
reporting accuracy of 84.54% on the KDD test set. Alrawashdeh et al. [22] im-
plemented a deep belief network (DBN) based on Restricted Boltzmann Machine
(RBM) architecture with softmax output layer for multi-classification purpose.
However, the proposed system was tested on 10% of the KDD99 test samples,
achieving accuracy and false alarm rate up to 98% and 2.47%, respectively. In
[23], the authors used the Software Defined Networking (SDN) environment and
proposed a deep neural network (DNN) for anomalies detection. Specifically,
a three-hidden layers neural network was trained on the NSL-KDD dataset.
However, just 6 features were used and only two-way discrimination was per-
formed (normal vs. abnormal). Experimental results reported performance of
75.75% accuracy. In [24], instead, Kim et al. proposed a deep neural network
trained on the KDD99 dataset. The DNN consisted of four hidden layers, 100

3
hidden neurons, trained with the adaptive moment estimation (Adam) method.
The authors claimed very good performances, but they used different subsets
of the original KDD99 dataset. Yan et al. [25] proposed a stacked sparse au-
toencoder (SSAE) to detect the categories of NSL-KDD dataset. The authors
claimed accuracy up to 98.63%, but they simplified the experimental process by
shuffling and reassembling the original data into several independent datasets.
Similarly, Xu et al. [26] developed an IDS based on deep neural networks to
classify samples of NSL-KDD dataset, achieving high performances. However,
they evaluated the effectiveness of the proposed model by performing the 10-fold
cross-validation method on the original NSL-KDD data. Imamverdiyev et al.
[27] developed three DL (Bernoulli-Bernoulli RBM, GaussianBernoulli RBM,
DBN) and three standard machine learning (SVM (radial basis), SVM (epsilon-
SVR), and decision tree) architectures. Experimental results reported that the
Bernoulli-Bernoulli RBM outperformed all other approches with accuracy rate
of 73.23 %. Javaid et al. [28] used sparse AE architectures and self-taught
learning (STL) for detecting anomalies of NSL-KDD dataset. The accuracy
value was 79.1% in multiclass classification. Yousefi-Azar et al. [29] developed
a Recurrent Neural Network (RNN) based system for intrusion detection. The
authors used the NSL-KDD dataset as benchmark and performed both binary
and multi-classification, achieving accuracy rates of 83.3% and 81.3%, respec-
tively. Recently, Shone et al. [30] implemented a stacked non-symmetric deep
autoencoder (SNDAE) architecture for cyber attacks detection. In this study,
the authors used the NSL-KDD dataset, reporting multi-classification perfor-
mances of 85.42%.

3. Material and methodology

In this Section, firstly, the NSL-KDD dataset used in this work is introduced.
Then, the proposed methodology, (including pre-processing, feature extraction
and classification) is described.
3.1. NSL-KDD database description
The NSL-KDD is a subset of the original KDD99 dataset and it is widely used
as benchmark in several intrusion detection systems. Indeed, NSL-KDD solves
some criticisms of the previous KDD99, such as the redundant and duplicate
records in train and test set that biased the classifiers toward more frequent
samples. NSL-KDD is a dataset made freely available by the Canadian Insti-
tute of Cybersecurity [31]. It has training and testing data sets, here denoted
as KDDT rain+ and KDDT est+ , which include 125973 and 22544 instances, re-
spectively. Specifically, since the KDDT est+ dataset had seventeen additional
attacks types,which were not included in the KDDT rain+ , for a fair classifica-
tion, the instances corresponding to such categories (3751) were removed. As a
result, the KDDT est+ was composed of 22544 - 3751 = 18793 examples. Fur-
ther details on the KDDT rain+ and KDDT est+ sets are reported in Table 1.
NSL-KDD dataset has zf (f=1,2,..41) features: 38 continuous and 3 symbolic,

4
as shown in Table 2. Furthermore, the attack types of the NSL-KDD dataset
are clustered into four different attack classes:
1. DoS (Denial of Service): DoS include attacks that cause the slowing or
shutting down of a machine by sending more traffic information to the
server than the system is able to handle. DoS attacks affect legitimate
network traffic or access to services.
2. R2L (Root to Local): R2L include attacks which provide illegal local
access to a machine by sending remote deceiving packets to the system.
3. U2R (User to Root): U2R include attacks which provide root access. In
this case, the hacker finds out the system vulnerability and starts using
the system as normal user.
4. Probe (Probing): Probe include attacks able to avoid the security control
systems by gathering information about the network.
The attack categories of the NSL-KDD dataset are reported in Tables 3.

Table 1: Details of the NSL-KDD dataset

NSL-KDD Total Normal Dos Probe R2L U2R

KDDT rain+ 125973 67343 45927 11656 995 52
KDDT est+ 18793 9710 5741 1106 2199 37

3.2. Methodology
Figure 1 shows the procedure of the proposed methodology. Firstly, the NSL-
KDD dataset is cleaned from outliers and min-max normalization technique is
used to scale data within the range 0 and 1. Afterwards, the one-hot-encoding is
applied to convert symbolic (or categorical) features into numeric values. Then,
the 38 numeric attributes are analyzed statistically in order to select the most
correlated features. Finally, shallow (MLP, L-SVM, Q-SVM, LDA, QDA) and
deep (AE, LSTM) networks are developed to measure the detection performance
both in binary and multi-classification scenario.
3.3. Data preprocessing
The proposed preprocessing stage arranges data to be proccessed by the next
modules properly. It includes three units: outliers analysis , data normalization
and one-hot encoding.

3.3.1. Outliers analysis

The NSL-KDD dataset is filtered from inconsistent values (outliers) as it has
proven to be an important operation before data normalization. Indeed, outliers
can interfere with the learning process causing miss detection in the proposed
intrusion systems. Here, the outliers are identified using the Median Absolute
Deviation Estimator (MADE), defined as follow:

5
Table 2: Features of NSL-KDD dataset: 38 numeric (or continuous, cont) and 3 categorical
(or symbolic, symb).

No. Features Types No Features Types

z1 duration cont z22 is guest login cont
z2 protocol type symb z23 count cont
z3 service symb z24 srv count cont
z4 flag symb z25 serror rate cont
z5 source bytes cont z26 srv serror rate cont
z6 destination bytes cont z27 rerror rate cont
z7 land cont z28 srv rerror rate cont
z8 wrong fragment cont z29 same srv rate cont
z9 urgent cont z30 diff srv rate cont
z10 hot cont z31 srv diff host rate cont
z11 num failed logins cont z32 dst host count cont
z12 logged in cont z33 dst host srv count cont
z13 num compromised cont z34 dst host same srv rate cont
z14 root shell cont z35 dst host diff srv rate cont
z15 su attempted cont z36 dst host same src port rate cont
z16 num root cont z37 dst host srv diff host rate cont
z17 num file creations cont z38 dst host serror rate cont
z18 num shells cont z39 dst host srv serror rate cont
z19 num access files cont z40 dst host rerror rate cont
z20 Num outbound cmds cont z41 dst-host srv rerror rate cont
z21 is host login cont

M ADE = P ∗ med(zf j − |med(zf j )|) (1)

where med is the median operator, zf j denotes the sample of the attribute zf ,
whereas P =1.4826 represents a multiplicative constant typically used under the
assumption of data normality. In this study, zf j was an outlier if

zf j > p ∗ M ADE (2)

with p=10. Removing the outliers, the KDDT rain+ size changed from 125973 to
85421, whereas, KDDT est+ size from 18793 to 11925. Table 4 reports the new
dataset (denoted as NSL-KDD? ) after eliminating outliers. It worth mentioning
that the dataset is significantly unbalanced. Indeed, it includes only 18 test
instances in the U2R attack category. Hence, the U2R class was removed from
the final dataset.

3.3.2. Data Normalization

The min-max normalization method was used to map the numeric features
values zij into the numeric range [0-1], according to:

zf j − min(zf )
z̃f j = (3)
max(zf ) − min(zf )

6
Table 3: Attack profiles of DoS, R2L, U2R, Probe classes.

Attack class Attack profile

DoS back, neptune, land, smurf, pod, teardrop
ftp write, imap, guess passwd,
R2L
multihop, phf, warezclient, spy, warezmaster
U2R loadmodule, buffer overflow, rootkit, perl
Probe nmap, ipsweep, satan, portsweep

NSL–KDD

KDDTrain+ KDDTest+

CLASSIFICATION
Normal
PRE-PROCESSING AE Dos
R2L
classifier Probe
Outlier analysis
Normal
LSTM Dos
Normalization classifier R2L
Probe

Normal
One-hot-encoding MLP Dos
R2L
classifier Probe
Normal
FEATURE EXTRACTION L-SVM Dos
R2L
classifier Probe

Normal
Q-SVM Dos
R2L
classifier Probe

Normal

102-dimentional
LDA Dos
R2L
features vector classifier Probe
Normal
QDA Dos
R2L
classifier Probe

Figure 1: Scheme of the proposed framework. It consists of a pre-processing, a feature ex-

traction and a classification module. As an example, in the figure, the classifiers refer to
multi-classification task.

where max(zf ) and min(zf ) represent the maximum and minimum values of
the f th (numeric) feature zf ; whereas z̃f j is the normalized feature value ranged
between [0-1].

3.3.3. One-hot-encoding
The three categorical features protocol type, service, flag (z2 , z3 , z4 , respectively)
were transformed into numerical values using the one-hot-encoding technique.
Specifically, each categorical attribute is represented by binary values. For ex-
ample, the z2 feature (protocol type) has three attributes: tcp, udp and icmp.
Applying the one-hot-encoding technique they were converted into binary vec-
tors: [1,0,0],[0,1,0],[0,0,1], respectively. Similarly, also z3 and z4 features (ser-
vice and flag) were converted into one-hot-encoding vectors. Overall, the 41-
dimensional features were mapped into 122-dimensional features (38 continuous
and 84 with binary values related to the features z2 , z3 , z4 ).

7
Table 4: NSL-KDD? dataset after discarding the outliers.

NSL-KDD? Total Normal Dos Probe R2L U2R

KDD?T rain+ 85421 51551 23272 9683 874 41
KDD?T est+ 11925 7341 1975 620 1971 18

3.4. Feature extraction

This processing module extracts the most correlated features. For each con-
tinuous feature, the percentage of zeros is evaluated both for KDD?T rain+ and
KDD?T est+ set. Figure 2 illustrates the distribution of null values of each nu-
meric variable in the KDD?T rain+ set. In this work, feature vectors with number
of zeros higher than 80% are excluded from subsequent elaborations. Specifi-
cally, 20 variables (indicated in red colour in Figure 2) are discarded; whereas,
the remaining 18 continuous features are combined with 84 one-hot-encoding
vectors for a grand total of 102-dimensional features vector. Such vector is the
input of the proposed shallow (MLP, L-SVM, Q-SVM, LDA, QDA) and deep
(AE, LSTM) classifiers.

Figure 2: Histogram of null values included in the 38 numeric variables of the KDD?T rain+
set. Features with zeros greater than 80% are represented in red colour and are removed from
the present study.

3.5. Classification
Two deep architectures based on AE, LSTM and three shallow architectures
based on standard MLP, SVM (with linear and quadratic kernel) and DA (with
linear and quadratic discriminant function) are developed to detect normal and
abnormal categories of the NSL-KDD dataset (Normal, DoS, R2L and Probe).
Details are presented in the following subsections.

8
3.5.1. Autoencoder
In this study, a deep classifier based on an AE was developed. An AE architec-
ture consists of an encoder and decoder operation: first, it transforms the input
data vector into a typically lower representation (encoder); then, it attempts to
reconstruct the original input from the compressed vector (decoder). The AE is
trained in unsupervised fashion and is able to capture significant features from
unlabeled data [32]. Figure 3 shows a classic AE model with a single hidden
layer. The input data vector z is encoded into a lower representation e:

e = ς(zW + b) (4)
where W represents the weight matrix, b is the bias vector and ς denotes the ac-
tivation function for the encoder. Afterwords, the decoding operation produces
the reconstruction of the input (z ) from the encoded representation (e):

z̃ = ζ(eW T + b) (5)
where ζ denotes the activation function for the decoder and z̃ is the recon-
structed vector.

Figure 3: Autoencoder standard configuration. The AE consists of two stages: encoding

and decoding operation. The encoding operation maps z into a compressed representation e;
whereas the decoding operation attempts to reconstruct z from e, so that z̃ ≈ z.

3.5.2. Proposed AE Architecture

Figure 4 (a) shows the proposed autoencoder framework (AE[102:50:102]). Specif-
ically, the AE encodes the 102-dimensional features representation (z ) into a
50-dimensional vector (e) and then decodes it back to the same input features
space. In this study, the AE[102:50:102] is trained in unsupervised manner,
using the scaled conjugate gradient method (SCG), for 102 epochs. Moreover,
the saturating linear activation function ς(s) is adopted in the compression and

9
reconstruction operations: ς(s) = 0 when s ≤ 0, s when 0 < s < 1, 0 when
s≥1.
The reconstruction error between z and z̃ is quantified using the mean squared
error (MSE) parameter. It worth mentioning that the minimum error was de-
tected with 50 hidden neurons and was of 0.0083. After training the AE[102:50:102],
the 50 latent features are used as input of a dense fully connected layer with
softmax activation function (AE50 classifier, Figure 4 (b)). At this stage, the
softmax layer is trained in supervised fashion for binary or multi-classification
purposes. Then, the fine-tuning approach is used. Specifically, the whole archi-
tecture, shown in Figure 4 (b), is re-trained with supervised learning algorithm
in order to improve the classification performances. The AE50 classifier was de-
veloped by using MATLAB R2018a (The MathWorks, Inc., Natick, MA, USA)
and trained until the cross-entropy loss function [33] converges, that is for 3*102
epochs.

Figure 4: (a) AE based classifier. The AE [102:50:102] reduces the 102-dimensional features
vector (z ) into 50 most latent features (e) and then reconstructs the original input from
the 50 compressed features. (b) Afterwards, the feature vector e is used as input of a final
softmax layer (o) for binary or multi-classification. Finally, the whole structure is fine-tuned
using conventional back propagation algorithm. In the figure, the AE classifier referred to
multi-class detection.

3.5.3. Long Short-Term Memory

In order to compare the AE classifier with other deep learning architectures,
a Long short-term memory (LSTM) based model was developed. LSTM are
units known as memory blocks of a recurrent neural network (RNN) [34]. Figure
5 shows a standard LSTM architecture. It includes a cell (g), an input gate (i),
an output gate (o) and a forget gate (l). A layer of LSTM units is able to learn

10
long-term dependencies between time steps in a sequence of data. Such LSTM
layer has two states: the hidden state (or the output state) that contains the
output at the time step t; and the cell state, that stores the information learned
from the previous time steps. At each time step t the hidden and cell state are
updated by using the aforementioned gates:

ct = lt ct−1 + it gt (6)

ht = ot tanh(ct ) (7)
where:
it = σg (Wi z + Ri ht−1 + bi ) (8)
ft = σg (Wf z + Rf ht−1 + bf ) (9)
gt = tanh(Wg z + Rg ht−1 + bg ) (10)
ot = σg (Wo z + Ro ht−1 + bo ) (11)
and where σ represents the sigmoid activation function, W is the weights
matrix, R is the recurrent weights matrix and b the bias.

Figure 5: Architecture of a standard LSTM unit.

3.5.4. Proposed LSTM Architecture

Figure 6 shows the designed LSTM classifier. It consists of input layer,
one LSTM layer and output dense layer. Specifically, for fair comparison, the
LSTM layer used 50 cells for encoding the input information. Then, the output
was feeded into the dense fully connected layer which has 2 or 4 neurons (with
softmax activation function) for binary and multi-classification, respectively.
The input at layer k is the hidden state value evaluated by layer k − 1. In
this study, the LSTM model was trained using the Adam (adaptive moment
estimation) optimizer, learning rate of 0.01 and mini-batch size of 128. These
learning parameters have chosen empirically after several experimental tests.

11
Figure 6: LSTM architecture. Similarly to the AE architecture it consists of one hidden layer
of 50 units followed by a softmax layer for binary and multi-classification. The architecture
in figure referred to multi-classification task.

3.5.5. Conventional Classifiers

In order to compare the proposed deep AE classifier also with conventional
techniques, the following shallow classifiers were developed:

• MLP classifier: MLP is a feed-forward neural network and uses supervising

learning algorithm for training [35]. Figure 7 illustrates the proposed
shallow MLP classifier. It is to be noted that, for fair comparison, MLP
and AE architectures have the same structure. Indeed, the MLP classifier
consists of one hidden layer with 50 neurons and a softmax output layer
for classification tasks.

• SVM classifier: SVM technique is based on a statistical learning theory

[36]. SVM finds the best hyperplane that provides the maximum sepa-
ration between classes. In this study, a SVM classifier with linear kernel
(L-SVM) and a SVM classifier with quadratic (Q-SVM) kernel are imple-
mented. Detailed mathematical formulation of SVM is reported in [37].

• DA classifier: DA is a statistical method typically used in machine learn-

ing. The goal of DA is to reduce the dimensionality and keep good sep-
arability among classes. Specifically, it projects the data samples onto a
lower-dimensional space so that the class-separability is maximum and the
dispersion of the samples belonging to the same class is minimum. In this
study, a discriminant classifier with linear function (LDA) and a discrimi-
nant classifier with quadratic function (QDA) are implemented. Detailed
mathematical formulation of DA techniques is reported in [38].

12
Figure 7: MLP architecture. Similarly to the AE architecture it consists of one hidden layer
of 50 units followed by a softmax layer for binary and multi-classification. The architecture
in figure referred to multi-classification task.

4. Experimental results

The performance of the proposed classifiers is measured with traditional

metrics: precision, recall, F1 score (or F-measure) and accuracy:

TP
P recision = (12)
TP + FP
TP
Recall = (13)
TP + FN
P recision ∗ Recall
F 1score = 2 ∗ (14)
P recision + Recall
TP + TN
Accuracy = (15)
TP + FP + TN + FN
where TP (True Positive) is the number of instances correctly detected as
anomalous; TN (True Negative) is the number of instances correctly detected
as normal; FP (False Positive) refers to the number of normal traffic patterns
missclassified as anomalous; FN (False negatives) refers to the number of anoma-
lous traffic patterns erroneously identified as normal. In order to estimate
the ability of the classifier to correctly detect normal and abnormal attacks,
the performances of the proposed architectures (AE, LSTM, MLP, L-SVM, Q-
SVM, LDA, QDA) were studied in binary classification (Normal, Abnormal)
and multi-classification (Normal, Dos, Probe, R2L) modality. It is to be noted
that, since F1 score parameter includes precision and recall information (eq.
14), the following considerations are based mainly on this measurement.

13
4.1. Binary classification
Table 6 reports the outcomes of binary classification experiments, where the
abnormal class includes DoS, Probe and R2L categories. The MLP classifier
achieved F1 scores of 86.65% and 70.69% in detecting normal and abnormal
categories, respectively. As regards SVM classifiers, Q-SVM outperformed L-
SVM in terms of average F1 score rate (reporting values of 81.39% and 77.54%,
respectively). Indeed, Q-SVM classifier achieved better performance in detect-
ing both normal and abnormal class (F1 scores of 87.87% and 74.90%, respec-
tively). As regards discriminant analysis, similarly, QDA outperformed LDA
in terms of average F1 score (reporting values of 77.78% and 75.16%, respec-
tively). However, LDA classifier achieved better F-measure values in detecting
normal samples (85.26%), whereas QDA in detecting anomalies (71.66%). As
regards the deep classifiers, the LSTM architecture achieved an average F1 score
of 79.24%. In contrast, the proposed deep AE classifier showed the highest F1
score, achieving an average value of 82.00%. Moreover, the AE based classifier
outperformed the aforementioned methods also in terms of accuracy (Table 5),
with a performance rate up to 84.21% as compared to LSTM, MLP, L-SVM,
Q-SVM, LDA and QDA classifiers which achieved accuracies of 82.04%, 81.65%,
80.8%, 83.15%, 79.27%, 76.84%, respectively. Similar results were achieved also
with the area under the curve (AUC) of the Receiver Operating Curve (ROC)
[39]. Indeed, as can be seen in Figure 8 (a), the AE classifier reported the best
AUC score (AUCAE =95.55%).

4.2. Multi-classification
Table 7 reports the outcomes of multi-classification experiments. Similar to
the binary classification analysis, the shallow MLP, L-SVM, Q-SVM, LDA, QDA
and deep LSTM, AE based classifiers were compared. The simulation results
showed that: the MLP classifier reported F1 score of 87.1%, 97.08%, 77.13% for
Normal, Dos and Probe attack classes, respectively; the Q-SVM outperformed
L-SVM classifier in terms of average F1 score with values of 75.11% and 69.76%,
respectively. Specifically, L-SVM reported F1 scores of 86.55%, 96.69%, 86.22%,
whereas Q-SVM F1 values of 88.32%, 97.41%, 82.81% for Normal, Dos and
Probe attack classes, respectively; the LDA, instead, outperformed QDA classi-
fier in terms of average F1 score with values of 76.49% and 64.36% respectively.
The LDA classifier reported F-measure of 90.69%, 91.14%, 70.87%, whereas the
QDA classifier achieved values of 87.98%, 74.64%, 47.86% for Normal, Dos and
Probe attack classes, respectively. However, it is to be noted that, the MLP, L-
SVM and Q-SVM based classifiers were not able to discriminate the R2L attack
category accurately (reporting F1 score of 11.74%, 9.45% and 31.90%, respec-
tively). The DA classifiers, instead, achieved better results in detecting the R2L
anomaly with F1 score of 53.27% (LDA) and 46.96% (QDA). As regards the deep
classifiers, LSTM architecture achieved an average F-measure rate of 67.17%.
The LSTM classifier reported very good discriminating performance only in
detecting Normal, Dos and Probe attack types (F-measure values of 86.12%,
96.90%, 84.05%, respectively) and remained inadequate in detecting the R2L

14
attack class. As regards, the deep AE classifier, similarly to the binary clas-
sification experiments, outperforms all the other machine learning algorithms,
reporting F1 score rate up to 98%. Furthermore, it is worth mentioning that
the AE classifier outperformed the LSTM and conventional techniques also in
term of accuracy, achieving the highest accuracy up to 87%. In contrast, LSTM,
MLP, L-SVM, Q-SVM, LDA and QDA classifiers reported accuracies of 80.67%,
81.43%, 81.4%, 83.65%, 83.17%, 79.47% respectively. Also in this scenario, this
result was confirmed evaluating the AUC. Indeed, as can be seen in Figure 8
(b), the AE classifier reported the best performance (AUCAE =96.1%).

Table 5: Accuracies of the proposed AE, LSTM, MLP, L-SVM, Q-SVM, LDA, QDA classifier
for binary and multi-classfication.

Method Binary-classification Multi-classification

AE 84.21% 87%
LSTM 82.04% 80.67%
MLP 81.65% 81.43%
L-SVM 80.8% 81.4%
Q-SVM 83.15% 83.65%
LDA 79.27% 83.17%
QDA 76.84% 79.47%

Table 6: Binary classification performance (Precision, Recall, F1 score) of AE, LSTM, MLP,
L-SVM, Q-SVM, LDA, QDA classifiers.

Precision
Attack class AE LSTM MLP L-SVM Q-SVM LDA QDA
Normal 81.09% 79.00% 78.49% 77.68% 80.84% 75.81% 81.08%
Abnormal 92.91% 91.25% 91.57% 90.96% 91.34% 92.38% 76.34%
AVG 87% 85.13% 85.03% 84.32% 86.09% 84.09% 78.71%
Recall
Attack class AE LSTM MLP L-SVM Q-SVM LDA QDA
Normal 96.96% 96.47% 96.69% 96.55% 96.24% 97.41% 86.94%
Abnormal 63.79% 58.92% 57.57% 55.56% 63.48% 50.22% 67.52%
AVG 80.37% 77.70% 77.13% 76.06% 79.86% 73.81% 77.23%
F1 score
Attack class AE LSTM MLP L-SVM Q-SVM LDA QDA
Normal 88.32% 86.86% 86.65% 86.09% 87.87% 85.26% 83.91%
Abnormal 75.64% 71.61% 70.69% 68.99% 74.90% 65.07% 71.66%
AVG 81.98% 79.24% 78.67% 77.54% 81.39% 75.16% 77.78%

5. Discussion

The present work introduces an innovative IDS based on a statistically driven

deep AE. For experiments and analysis, the benchmark NSL-KDD dataset was

15
Table 7: Multi-classification performance (Precision, Recall, F1 score) of AE, LSTM, MLP,
L-SVM, Q-SVM, LDA, QDA classifiers.

Precision
Attack class AE LSTM MLP L-SVM Q-SVM LDA QDA
Normal 85.03% 77.70% 79.46% 78.37% 81.35% 88.01% 84.28%
Dos 97.05% 96.39% 97.21% 96.52% 97.71% 95.45% 97.79%
Probe 69.82% 75.26% 64.33% 77.03% 71.15% 60.20% 31.62%
R2L 99.49% 80.00% 99.19% 95.15% 98.68% 79.62% 99.34%
AVG 87.85% 82.34% 85.05% 86.77% 87.22% 80.82% 78.26%
Recall
Attack class AE LSTM MLP L-SVM Q-SVM LDA QDA
Normal 96.19% 96.58% 96.35% 96.64% 96.59% 93.53% 92.02%
Dos 98.18% 97.42% 96.96% 96.86% 97.11% 87.19% 60.35%
Probe 94.03% 95.16% 96.29% 97.90% 99.03% 86.13% 98.39%
R2L 39.78% 0.81% 6.24% 4.97% 19.03% 40.03% 30.75%
AVG 82.04% 72.49% 73.96% 74.09% 77.94% 76.72% 70.38%
F1 score
Attack class AE LSTM MLP L-SVM Q-SVM LDA QDA
Normal 90.27% 86.12% 87.10% 86.55% 88.32% 90.69% 87.98%
Dos 97.61% 96.90% 97.08% 96.69% 97.41% 91.14% 74.64%
Probe 80.14% 84.05% 77.13% 86.22% 82.81% 70.87% 47.86%
R2L 56.83% 1.61% 11.74% 9.45% 31.90% 53.27% 46.96%
AVG 81.21% 67.17% 68.26% 69.73% 75.11% 76.49% 64.36%

employed. The strengths and effectiveness of the proposed IDS were evaluated
using standard measurements including precision, recall, F1 score and accu-
racy. The most correlated features were extracted through statistical analysis
which were used as input to the deep (AE, LSTM) and shallow ML approaches
including MLP, L-SVM, Q-SVM, LDA and QDA. Moreover, both the binary
(Normal vs Abnomal) and multi-classification (Normal vs Dos vs R2L vs Probe)
were performed. As regards the shallow ML architectures, experimental results
showed that the Q-SVM classifier achieved the best performances for both bi-
nary (83.15% accuracy) and multi-class discrimination (83.65% accuracy), as
compared with MLP, L-SVM, LDA, QDA classifiers. As regards the deep ML
architectures, the AE classifier achieved the highest performances for both bi-
nary (84.21% accuracy) and multi-class discrimination (87% accuracy), as com-
pared with LSTM classifier. Hence, as can be observed, comparative results
with shallow and deep classifiers showed that the deep autoencoder archite-
cuture outperformed the other ML approaches proposed. Furthermore, this
result was confirmed also in terms of AUC: 95.65% and 96.1% for binary and
multi classification, respectively. It is to be noted that the optimal AE struc-
ture was found by estimating the performance of different numbers of hidden
layers (HL) and hidden units. Specifically, Table 9 reports the binary and
multi-classification accuracies of different AE architectures. As can be seen, the
minimum classification perfomance of 79.56% was obtained with an AE40,25,12
(in the binary anomaly detection scenario) and of 69.04% with AE50,30,12 (in
the multi-class anomaly detection scenario). However, the highest performance

16
was achieved by the proposed AE50 : 84.21% for binary-classification and 87%
for multi-classification, respectively. For fair comparison, shallow and deep net-
works have been developed with the same structure. Indeed, both MLP and
LSTM classifier consisted of 50 hidden units. Furthermore, the proposed deep
AE was compared also with most recent approaches in the literature that used
NSL-KDD dataset. Since most of the existing works focused on discriminat-
ing NSL-KDD attack types, we compared the performance of AE classifier in
multi-classification modality.
Recently, the authors in [40] proposed a hardware-software co-design ma-
chine learning accelerator based on sequential learning algorithm, achieving ac-
curacy up to 76.04% and training time of 144.5 s. Similarly, the authors in [28],
proposed a sparse AE architecture reporting accuracy of 79.10%; whereas in
[27] the authors designed a Gaussian-Bernoulli RBM consisted of 7 layers of 100
neurons, achieving accuracy rate up to 73.23%. A stacked non-symmetric deep
AE was developed in [30]. Specifically, they proposed a 3-hidden layers AE com-
bined with a Random Forest classifier, achieving multi-classification accuracy
rate up to 85.42% and minimum training time of 644.84 s. In [29], the authors
modelled a RNN-IDS with 80 hidden untis, reporting multiclass accuracy of
81.29% and a training time of 11444 s.
In contrast to these approaches, we proposed a statistical analysis and driven
intelligent AE classifier that achieved multiclass accuracy up to and 87%. How-
ever, it is to be noted that, although the proposed IDS outperfomed the aforeme-
tioned works, a difference of about 4% in accuracy was observed when compared
with Shone et al. [30]. Nevertheless, the AE developed here had a very simple
architecture with only 1 hidden layer with just 50 hidden units. Consequently,
the proposed IDS was optimized in terms of number of learning parameters and
time. Indeed, the training process, executed on high perfomance GPU GeForce
RTX 2080 Ti installed on an Intel(R) Core(TM) i7-8000K CPU processor with
64 GB RAM, was only 22.53s.
Table 8: Performance of the proposed IDS with the recent state-of-the-art techniques.

Method Accuracy
Proposed AE 87%
Proposed LSTM 80.67%
Imamverdiyev et al. [27] 73.23%
Huang et al. [40] 76.04%
Javaid et al. [28] 79.10%
Yin et al. [29] 81.29%
Shone et al. [30] 85.42%

6. Conclusion
In this paper the authors presented a novel statistical analysis and autoencoder
driven intelligent intrusion detection approach. The proposed IDS was tested

17
Table 9: Evaluation performance of AE with different hidden layers (HL).

Accuracy Accuracy
Classifier HL1 HL2 HL3
Binary Classification Multi-Classification
AE40 40 - - 80.87% 78.12%
AE40,20 40 20 - 80.65% 77.17%
AE40,20,12 40 20 12 79.97% 77.83%
AE40,25 40 25 - 80.28% 79.36%
AE40,25,12 40 25 12 79.56% 78.23%
AE40,30 40 30 - 80.48% 79,00%
AE40,30,12 40 30 12 79.84% 76.51%
AE50 50 - - 84.24% 87%
AE50,20 50 20 - 81.07% 78.62%
AE50,20,12 50 20 12 80.77% 75.70%
AE50,25 50 25 - 82.03% 81.84%
AE50,25,12 50 25 12 81.36% 80.65%
AE50,30 50 30 - 81.42% 81.13%
AE50,30,12 50 30 12 80.84% 69.04%
AE60 60 - - 80.26% 79.82%
AE60,20 60 20 - 80.49% 79.23%
AE60,20,12 60 20 12 79.94% 77.51%
AE60,25 60 25 - 81.28% 79.26%
AE60,25,12 60 25 12 81.18% 78.68%
AE60,30 60 30 - 80.48% 79.27%
AE60,30,12 60 30 12 80.24% 74.50%

by using the NSL-KDD dataset as benchmark. The most significant features

extracted via a data driven deep learning method were fed into the AE archi-
tecture consisted of a single hidden layer of 50 units (AE50 ). The proposed
AE50 classifier was compared with deep and traditional algorithms (Table 5)
as well as recent state-of-the-art techniques (Table 8). Comparative results re-
ported that the AE50 classifier achieved higher performance than all the other
methods (84.21% accuracy in binary classification (Normal, Abnormal) and 87%
accuracy in multi-classification (Normal, Dos, R2L, Probe, U2R)). The efficacy
of the proposed AE50 framework was also estimated with the analysis of ROC.
Indeed, the AE50 achieved the highest AUC score in both scenarios (95.65% in
2-way discrimination and 96.1% in 4-way discrimination).
In the future, we intend to develop more accurate deep architectures able to
manage real-time data flows similar to NSL-KDD instances to identify malicious
attacks in real-time. In addition, in order to exploit long-term learning, faster
decisions criteria together with reduced computational complexity for real-time
big data analysis, we will explore the integration of the methods proposed in
[41], [42] with the work here presented.

7. Acknowledgements
This work was funded by the UK EPSRC (Engineering and Physical Sciences
Research Council) grant, code: EP/M026981/1.

18
(a)

(b)

Figure 8: ROC curves of the proposed AE, LSTM, MLP, L-SVM, Q-SVM, LDA, QDA clas-
sifiers for binary (a) and multi-classification (b).

19
References

[1] H. Sundmaeker, P. Guillemin, P. Friess, S. Woelfflé, Vision and challenges

for realising the internet of things, Cluster of European Research Projects
on the Internet of Things, European Commision 3 (3) (2010) 34–36.
[2] S. Goel, K. Williams, E. Dincelli, Got phished? internet security and
human vulnerability, Journal of the Association for Information Systems
18 (1) (2017) 22.
[3] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, nature 521 (7553) (2015)
436.
[4] S. Gasparini, M. Campolo, C. Ieracitano, N. Mammone, E. Ferlazzo,
C. Sueri, G. G. Tripodi, U. Aguglia, F. C. Morabito, Information theoretic-
based interpretation of a deep neural network approach in diagnosing psy-
chogenic non-epileptic seizures, Entropy 20 (2) (2018) 43.

[5] C. Ieracitano, N. Mammone, A. Bramanti, A. Hussain, F. C. Morabito, A

convolutional neural network approach for classification of dementia stages
based on 2d-spectral representation of eeg recordings, Neurocomputing.
[6] N. Zeng, Z. Wang, H. Zhang, W. Liu, F. E. Alsaadi, Deep belief networks
for quantitative analysis of a gold immunochromatographic strip, Cognitive
Computation 8 (4) (2016) 684–692.
[7] N. Zeng, H. Qiu, Z. Wang, W. Liu, H. Zhang, Y. Li, A new switching-
delayed-pso-based optimized svm algorithm for diagnosis of alzheimers dis-
ease, Neurocomputing 320 (2018) 195–202.

[8] M. Mahmud, M. S. Kaiser, A. Hussain, S. Vassanelli, Applications of deep

learning and reinforcement learning to biological data, IEEE transactions
on neural networks and learning systems 29 (6) (2018) 2063–2079.
[9] K. Dashtipour, M. Gogate, A. Adeel, C. Ieracitano, H. Larijani, A. Hussain,
Exploiting deep learning for persian sentiment analysis, in: International
Conference on Brain Inspired Cognitive Systems, Springer, 2018, pp. 597–
604.
[10] Y. Ma, H. Peng, T. Khan, E. Cambria, A. Hussain, Sentic lstm: a hybrid
network for targeted aspect-based sentiment analysis, Cognitive Computa-
tion 10 (4) (2018) 639–650.

[11] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-

scale image recognition, arXiv preprint arXiv:1409.1556.
[12] X. Sun, M. Lv, Facial expression recognition based on a hybrid model
combining deep and shallow features, Cognitive Computation (2019) 1–11.

20
[13] G. Zhong, S. Yan, K. Huang, Y. Cai, J. Dong, Reducing and stretch-
ing deep convolutional activation features for accurate image classification,
Cognitive Computation 10 (1) (2018) 179–186.
[14] N. Zeng, H. Zhang, B. Song, W. Liu, Y. Li, A. M. Dobaie, Facial expression
recognition via learning deep sparse autoencoders, Neurocomputing 273
(2018) 643–649.
[15] L. Wang, B. Jiang, Z. Tu, A. Hussain, J. Tang, Robust pixelwise saliency
detection via progressive graph rankings, Neurocomputing 329 (2019) 433–
446.
[16] M. M. Najafabadi, F. Villanustre, T. M. Khoshgoftaar, N. Seliya, R. Wald,
E. Muharemagic, Deep learning applications and challenges in big data
analytics, Journal of Big Data 2 (1) (2015) 1.
[17] M. Tavallaee, E. Bagheri, W. Lu, A. A. Ghorbani, A detailed analysis
of the kdd cup 99 data set, in: Computational Intelligence for Security
and Defense Applications, 2009. CISDA 2009. IEEE Symposium on, IEEE,
2009, pp. 1–6.
[18] B. Ingre, A. Yadav, Performance analysis of nsl-kdd dataset using ann, in:
Signal Processing And Communication Engineering Systems (SPACES),
2015 International Conference on, IEEE, 2015, pp. 92–96.
[19] L. M. Ibrahim, D. T. Basheer, M. S. Mahmod, A comparison study for
intrusion database (kdd99, nsl-kdd) based on self organization map (som)
artificial neural network, Journal of Engineering Science and Technology
8 (1) (2013) 107–119.
[20] H. Mohamed, H. Hefny, A. Alsawy, Intrusion detection system using ma-
chine learning approaches, Egyptian Computer Science Journal 42 (3).
[21] Y. Gao, Y. Liu, Y. Jin, J. Chen, H. Wu, A novel semi-supervised learning
approach for network intrusion detection on cloud-based robotic system,
IEEE Access.
[22] K. Alrawashdeh, C. Purdy, Toward an online anomaly intrusion detection
system based on deep learning, in: Machine Learning and Applications
(ICMLA), 2016 15th IEEE International Conference on, IEEE, 2016, pp.
195–200.
[23] T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, M. Ghogho, Deep
learning approach for network intrusion detection in software defined net-
working, in: Wireless Networks and Mobile Communications (WINCOM),
2016 International Conference on, IEEE, 2016, pp. 258–263.
[24] J. Kim, N. Shin, S. Y. Jo, S. H. Kim, Method of intrusion detection using
deep neural network, in: Big Data and Smart Computing (BigComp), 2017
IEEE International Conference on, IEEE, 2017, pp. 313–316.

21
[25] B. Yan, G. Han, Effective feature extraction via stacked sparse autoencoder
to improve intrusion detection system, IEEE Access 6 (2018) 41238–41248.
[26] C. Xu, J. Shen, X. Du, F. Zhang, An intrusion detection system using
a deep neural network with gated recurrent units, IEEE Access 6 (2018)
48697–48707.
[27] Y. Imamverdiyev, F. Abdullayeva, Deep learning method for denial of ser-
vice attack detection based on restricted boltzmann machine, Big Data
6 (2) (2018) 159–169.
[28] A. Javaid, Q. Niyaz, W. Sun, M. Alam, A deep learning approach for
network intrusion detection system, in: Proceedings of the 9th EAI In-
ternational Conference on Bio-inspired Information and Communications
Technologies (formerly BIONETICS), ICST (Institute for Computer Sci-
ences, Social-Informatics and Telecommunications Engineering), 2016, pp.
21–26.
[29] C. Yin, Y. Zhu, J. Fei, X. He, A deep learning approach for intrusion
detection using recurrent neural networks, IEEE Access 5 (2017) 21954–
21961.
[30] N. Shone, T. N. Ngoc, V. D. Phai, Q. Shi, A deep learning approach to
network intrusion detection, IEEE Transactions on Emerging Topics in
Computational Intelligence 2 (1) (2018) 41–50.
[31] M. Tavallaee, E. Bagheri, W. Lu, A. A. Ghorbani, Nsl-kdd dataset,
Available on http://www. unb. ca/research/iscx/dataset/iscx-NSL-KDD-
dataset. html),[Accessed on 28 Feb. 2016].
[32] G. E. Hinton, R. R. Salakhutdinov, Reducing the dimensionality of data
with neural networks, science 313 (5786) (2006) 504–507.
[33] J. Shore, R. Johnson, Axiomatic derivation of the principle of maximum
entropy and the principle of minimum cross-entropy, IEEE Transactions on
information theory 26 (1) (1980) 26–37.
[34] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural computa-
tion 9 (8) (1997) 1735–1780.
[35] B. Yegnanarayana, Artificial neural networks, PHI Learning Pvt. Ltd.,
2009.
[36] V. N. Vapnik, An overview of statistical learning theory, IEEE transactions
on neural networks 10 (5) (1999) 988–999.
[37] I. Steinwart, A. Christmann, Support vector machines, Springer Science &
Business Media, 2008.
[38] A. J. Izenman, Linear discriminant analysis, in: Modern multivariate sta-
tistical techniques, Springer, 2013, pp. 237–280.

22
[39] A. P. Bradley, The use of the area under the roc curve in the evaluation of
machine learning algorithms, Pattern recognition 30 (7) (1997) 1145–1159.
[40] H. Huang, R. S. Khalid, W. Liu, H. Yu, Work-in-progress: a fast online
sequential learning accelerator for iot network intrusion detection, in: Hard-
ware/Software Codesign and System Synthesis (CODES+ ISSS), 2017 In-
ternational Conference on, IEEE, 2017, pp. 1–2.
[41] A. Adeel, H. Larijani, A. Ahmadinia, Random neural network based novel
decision making framework for optimized and autonomous power control
in lte uplink system, Physical Communication 19 (2016) 106–117.

[42] X. Yang, K. Huang, R. Zhang, J. Y. Goulermas, A. Hussain, A new two-

layer mixture of factor analyzers with joint factor loading model for the
classification of small dataset problems, Neurocomputing 312 (2018) 352–
363.

23
Cosimo Ieracitano received the M.Eng. (summa cum laude) and Ph.D. de-
grees (with additional label of Doctor Europaeus) from the University Mediter-
ranea of Reggio Calabria (UNIRC), Italy, in 2013 and 2019, respectively. He
is currently a Research Fellow at the Neurolab group of the DICEAM Depart-
ment of the same University (UNIRC). He was a Visiting Master Student at
ETH Zurich and a Visiting PhD Student at the University of Stirling in 2013
and 2018, respectively. He is author/co-author of publications in peerreviewed
national/international journals and conference contributions. He is Local Ar-
rangements Chair for IEEE WCCI 2020. His main research interests include:
information theory, machine learning, deep learning techniques and biomedical
signal processing, in particular EEG signals of subjects affected by neuropatholo-
gies.

24
Ahsan Adeel holds B. Eng. (Hons), MSc (EEE), and PhD (Cognitive Com-
puting) degrees. Following an EPSRC/MRC prestigious fellowship at the Uni-
versity of Stirling (2016-18), he is currently a Lecturer (Assistant Professor) in
Computing Science at the University of Wolverhampton, UK, where he is lead-
ing the Conscious Multisensory Integration (CMI) Lab. He is a Visiting Fellow
at MIT Synthetic Intelligence Lab and Computational Neuroscience Lab (Uni-
versity of Oxford). His ongoing multidisciplinary research aims to explore and
exploit the power of advanced AI to design unorthodox brain-inspired cognitive
computing architectures by integrating suitable deep machine learning, reason-
ing, and optimization algorithms. His focused approaches include biophysical
and hardware-efficient neural models, explainable artificial intelligence, opti-
mized resource management, multimodal fusion, context-aware decision-making,
low power 5G IoT devices, and neuromorphic chips.

Francesco C. Morabito (M’89 - SM’00) was the Dean with the Faculty

25
of Engineering and Deputy Rector with the University Mediterranea of Reg-
gio Calabria, Reggio Calabria, Italy, where he is currently a Full Professor of
Electrical Engineering. He is also serving as the Vice-Rector for International
and Institutional Relations. He has authored or co-authored over 400 papers
in international journals/conference proceedings in various fields of engineering
(radar data processing, nuclear fusion, biomedical signal processing, nondestruc-
tive testing and evaluation, machine learning, and computational intelligence).
He has co-authored 15 books and holds three international patents. Prof. Mora-
bito is a Foreign Member of the Royal Academy of Doctors, Spain, in 2004, and
a member of the Institute of Spain, Barcelona Economic Network, in 2017. He
served as the Governor of the International Neural Network Society for 12 years
and as the President of the Italian Network Society from 2008 to 2014. He is
a member on the editorial boards of various international journals, including
the International Journal of Neural Systems, Neural Networks, International
Journal of Information Acquisition, and Renewable Energy.

Amir Hussain obtained his B.Eng. (with the highest 1st Class Honors)
and Ph.D. (in novel neural network architectures and algorithms) from the Uni-
versity of Strathclyde in Glasgow, Scotland, UK, in 1992 and 1997 respectively.
Following postdocoral and academic positions at the University of West of Scot-
land (1996-98), University of Dundee (1998-2000), and University of Stirling
(2000-2018) respectively, he joined Edinburgh Napier University, in Scotland,
UK, in 2018, as Professor of Computing Science, and founding Director of the
Cognitive Big Data and Cybersecurity (CogBiD) Research Laboratory. His re-
search interests are cross-disciplinary and industry focussed, and include secure
and context-aware 5G-IoT driven AI, and multi-modal cognitive and sentic com-
puting techniques and applications. He has published more than 400 papers,
including over a dozen books and around 150 journal papers. He has led ma-
jor national, European and international projects and supervised more than 30
PhD students, He is founding Editor-in-Chief of two leading journals: Cogni-
tive Computation (Springer Nature), and BMC Big Data Analytics (BioMed
Central); and Chief-Editor of the Springer Book Series on Socio-Affective Com-
puting, and Cognitive Computation Trends. He has been appointed invited
Associate Editor of several prestigeous journals, including the IEEE Transac-
tions on Neural Networks and Learning Systems, the IEEE Transactions on
Emerging Topics in Computational Intelligence, and (Elsevier) Information Fu-
sion. He is Vice-Chair of the Emergent Technologies Technical Committee of

26
the IEEE Computational Intelligence Society (CIS), and Chapter Chair of the
IEEE UK and RI Industry Applications Society.

27
Declaration of interests

The authors declare that they have no known competing financial interests or
personal relationships that could have appeared to influence the work reported
in this paper.

8087 Numeric Data Processor
No ratings yet
8087 Numeric Data Processor
13 pages
1 s2.0 S1568494624001522 Main
No ratings yet
1 s2.0 S1568494624001522 Main
9 pages
An Efficient Intrusion Detection System With Custom Features Using FPA-Gradient Boost Machine Learning Algorithm
No ratings yet
An Efficient Intrusion Detection System With Custom Features Using FPA-Gradient Boost Machine Learning Algorithm
17 pages
Support Based Graph Framework For Effective Intrusion Detection
No ratings yet
Support Based Graph Framework For Effective Intrusion Detection
22 pages
10.1515 - Eng 2022 0403
No ratings yet
10.1515 - Eng 2022 0403
11 pages
Xu 2019 Case Study Deep Learning Net Intr Detect
No ratings yet
Xu 2019 Case Study Deep Learning Net Intr Detect
6 pages
Ids Ae 2
No ratings yet
Ids Ae 2
9 pages
01-JCCE2202270 Online
No ratings yet
01-JCCE2202270 Online
10 pages
A Study On Anomaly-Based Intrusion Detection Systems Employing Supervised Deep Learning Techniques
No ratings yet
A Study On Anomaly-Based Intrusion Detection Systems Employing Supervised Deep Learning Techniques
5 pages
EESNN Hybrid Deep Learning Empowered SpatialTemporal Features for Network Intrusion Detection System
No ratings yet
EESNN Hybrid Deep Learning Empowered SpatialTemporal Features for Network Intrusion Detection System
16 pages
Intrusion Detection System Based On One-Class Supp
No ratings yet
Intrusion Detection System Based On One-Class Supp
16 pages
s00521-023-09309-y
No ratings yet
s00521-023-09309-y
19 pages
Aleesa2020 Article ReviewOfIntrusionDetectionSyst
No ratings yet
Aleesa2020 Article ReviewOfIntrusionDetectionSyst
32 pages
Enhanced Network Anomaly Detection Based On Deep Neural Networks
No ratings yet
Enhanced Network Anomaly Detection Based On Deep Neural Networks
16 pages
Reference
No ratings yet
Reference
5 pages
Batch 1_4 CSE C
No ratings yet
Batch 1_4 CSE C
9 pages
An Efficient Hyperparameter Control Method For A Network
No ratings yet
An Efficient Hyperparameter Control Method For A Network
15 pages
1 s2.0 S2772503023000130 Main
No ratings yet
1 s2.0 S2772503023000130 Main
13 pages
Adversarial Attack Detection Framework Based On Optimized Weighted Conditional Stepwise Adversarial Network
No ratings yet
Adversarial Attack Detection Framework Based On Optimized Weighted Conditional Stepwise Adversarial Network
24 pages
Real-Time_Network_Intrusion_Detection_System_Based_on_Deep_Learning
No ratings yet
Real-Time_Network_Intrusion_Detection_System_Based_on_Deep_Learning
4 pages
Performance Evaluation of Machine Learning Algorithms For Intrusion Detection System
No ratings yet
Performance Evaluation of Machine Learning Algorithms For Intrusion Detection System
20 pages
Effective Feature Extraction via StackedSparse Autoencoder to ImproveIntrusion Detection System
No ratings yet
Effective Feature Extraction via StackedSparse Autoencoder to ImproveIntrusion Detection System
11 pages
Statistical Performance Assessment of Supervised Machine Learning Algorithms For Intrusion Detection System
No ratings yet
Statistical Performance Assessment of Supervised Machine Learning Algorithms For Intrusion Detection System
12 pages
2312.03245v1
No ratings yet
2312.03245v1
19 pages
Machine Learning Based Network Intrusion Detection For Big and Imbalanced Data Using Oversampling, Stacking Feature Embedding and Feature Extraction
No ratings yet
Machine Learning Based Network Intrusion Detection For Big and Imbalanced Data Using Oversampling, Stacking Feature Embedding and Feature Extraction
44 pages
Deep_Convolutional_Neural_Networks_for_Intrusion_Detection_in_Automotive_Ethernet_Networks
No ratings yet
Deep_Convolutional_Neural_Networks_for_Intrusion_Detection_in_Automotive_Ethernet_Networks
6 pages
Enhanced intrusion detection through dual reduction and robust mean
No ratings yet
Enhanced intrusion detection through dual reduction and robust mean
11 pages
s42400-021-00103-8
No ratings yet
s42400-021-00103-8
22 pages
631eaa91dbcfb7 78471842
No ratings yet
631eaa91dbcfb7 78471842
13 pages
Smart Intrusion Detection System Compris
No ratings yet
Smart Intrusion Detection System Compris
6 pages
Intrusion Detection
No ratings yet
Intrusion Detection
76 pages
A CNN-based Attack Classification Versus An AE-based Unsupervised Anomaly Detection For Intrusion Detection Systems
No ratings yet
A CNN-based Attack Classification Versus An AE-based Unsupervised Anomaly Detection For Intrusion Detection Systems
7 pages
1-s2.0-S2352864820302868-main
No ratings yet
1-s2.0-S2352864820302868-main
8 pages
ramaiah2021
No ratings yet
ramaiah2021
17 pages
LV, Lu Wang, Wenhai Zhang, Zeyin Liu, Xinggao (2020)
No ratings yet
LV, Lu Wang, Wenhai Zhang, Zeyin Liu, Xinggao (2020)
17 pages
1 s20 S2542660521001037 Main
No ratings yet
1 s20 S2542660521001037 Main
18 pages
Ai Based Hybrid Ensemble Technique For Intrusion Detection System
No ratings yet
Ai Based Hybrid Ensemble Technique For Intrusion Detection System
17 pages
Building Auto-Encoder Intrusion Detection System Based On Random
No ratings yet
Building Auto-Encoder Intrusion Detection System Based On Random
15 pages
Multi Level Deep Learning Model For Network Anomal
No ratings yet
Multi Level Deep Learning Model For Network Anomal
12 pages
A Review of Intrusion Detection System Using Machine Learning Approach
No ratings yet
A Review of Intrusion Detection System Using Machine Learning Approach
8 pages
Symmetry 15 01251
No ratings yet
Symmetry 15 01251
31 pages
Network Intrusion Detection in Big Datasets Using Spark Environment and Incremental Learning
No ratings yet
Network Intrusion Detection in Big Datasets Using Spark Environment and Incremental Learning
8 pages
Unified and Evolved Approach Based On Neural Network and Deep Learning Methods For Intrusion Detection
No ratings yet
Unified and Evolved Approach Based On Neural Network and Deep Learning Methods For Intrusion Detection
9 pages
Elevating Cybersecurity Using AI and Deep Learning for Intrusion Detection Reinforcement
No ratings yet
Elevating Cybersecurity Using AI and Deep Learning for Intrusion Detection Reinforcement
27 pages
Hybrid Intrusion Detection Using MapReduce Based Black Widow Optimized Convolutional Long Short-Term Memory Neural Networks
No ratings yet
Hybrid Intrusion Detection Using MapReduce Based Black Widow Optimized Convolutional Long Short-Term Memory Neural Networks
17 pages
Intrusion Detection Using Big Data and Deep Learning Techniques
No ratings yet
Intrusion Detection Using Big Data and Deep Learning Techniques
9 pages
Performance Analysis of Machine Learning
No ratings yet
Performance Analysis of Machine Learning
22 pages
HDLNIDS Hybrid Deep-Learning
No ratings yet
HDLNIDS Hybrid Deep-Learning
17 pages
Intrusion Detection Model using Machine Learning Algorithms on NSL-KDD Dataset
No ratings yet
Intrusion Detection Model using Machine Learning Algorithms on NSL-KDD Dataset
14 pages
Supervised Machine Learning Algorithms For Intrusion Detection
No ratings yet
Supervised Machine Learning Algorithms For Intrusion Detection
14 pages
Intrusion Detection
No ratings yet
Intrusion Detection
29 pages
Toward support-vector machine-based ant colony optimization algorithms for intrusion detection
No ratings yet
Toward support-vector machine-based ant colony optimization algorithms for intrusion detection
9 pages
IEEE_Conference_Template
No ratings yet
IEEE_Conference_Template
4 pages
Deep Learning Approaches For Network Int
No ratings yet
Deep Learning Approaches For Network Int
116 pages
Acharya, Toya Khatri, Ishan Annamalai, Annamalai Chouikha, Mohamed F (2021)
No ratings yet
Acharya, Toya Khatri, Ishan Annamalai, Annamalai Chouikha, Mohamed F (2021)
7 pages
Deep Auto Encoders For Intrusion Detection
No ratings yet
Deep Auto Encoders For Intrusion Detection
8 pages
Paper 167
No ratings yet
Paper 167
15 pages
1 s2.0 S0957417423024144 Main
No ratings yet
1 s2.0 S0957417423024144 Main
16 pages
Elevating Cybersecurity Using AI and Deep Learning for Intrusion Detection Reinforcement Ppt
No ratings yet
Elevating Cybersecurity Using AI and Deep Learning for Intrusion Detection Reinforcement Ppt
9 pages
AWID For IntrusionCISS2019
No ratings yet
AWID For IntrusionCISS2019
6 pages
Effective Vulnerability Management: Managing Risk in the Vulnerable Digital Ecosystem
From Everand
Effective Vulnerability Management: Managing Risk in the Vulnerable Digital Ecosystem
Chris Hughes
5/5 (1)
Microprocessors Lecture3
No ratings yet
Microprocessors Lecture3
45 pages
8279 Programmable Keyboard
No ratings yet
8279 Programmable Keyboard
15 pages
8086 Instruction Sets
No ratings yet
8086 Instruction Sets
13 pages
Addressing Modes in 8086 Microprocessor
No ratings yet
Addressing Modes in 8086 Microprocessor
14 pages
Review Machine Learning Techniques Applied To Cybersecurit
No ratings yet
Review Machine Learning Techniques Applied To Cybersecurit
14 pages
Image Sharpening Using Frequency Domain Filters
No ratings yet
Image Sharpening Using Frequency Domain Filters
18 pages
Attack Classification Using Feature Selection Techniques A Comparative Study (CB)
No ratings yet
Attack Classification Using Feature Selection Techniques A Comparative Study (CB)
18 pages
What Is The OSI Security Architecture?
No ratings yet
What Is The OSI Security Architecture?
5 pages
Deep Learning Enabled Semantic Communication Systems
No ratings yet
Deep Learning Enabled Semantic Communication Systems
13 pages
On Portal User Manualand Training Guide Rev 3
No ratings yet
On Portal User Manualand Training Guide Rev 3
95 pages
Server Side Application
No ratings yet
Server Side Application
6 pages
Pluses, Minuses, Interesting/Implications (PMI) Chart Instructions
No ratings yet
Pluses, Minuses, Interesting/Implications (PMI) Chart Instructions
1 page
Linear Motion Explained With Worked Examples - SSZakariyah
No ratings yet
Linear Motion Explained With Worked Examples - SSZakariyah
80 pages
Abstract Reasoning
100% (5)
Abstract Reasoning
17 pages
3.2.4.7 Lab - Researching RFCs
No ratings yet
3.2.4.7 Lab - Researching RFCs
6 pages
M7.4 - BK - v5.0 - 20190105 - Data Handling
No ratings yet
M7.4 - BK - v5.0 - 20190105 - Data Handling
103 pages
Participants:: One Stop Towards Your Complete Healthcare!!!
No ratings yet
Participants:: One Stop Towards Your Complete Healthcare!!!
8 pages
Haunted Attraction The Haunt Davie Florida
No ratings yet
Haunted Attraction The Haunt Davie Florida
1 page
W2733841M PDF
No ratings yet
W2733841M PDF
84 pages
SKY 9 User Manual
No ratings yet
SKY 9 User Manual
7 pages
Instructions For Authors of SBC Conferences Papers and Abstracts
No ratings yet
Instructions For Authors of SBC Conferences Papers and Abstracts
3 pages
615 BPR STD Wiring
No ratings yet
615 BPR STD Wiring
6 pages
Digital Electronics 4 QP
No ratings yet
Digital Electronics 4 QP
7 pages
Reverse Engineering Pe Format
No ratings yet
Reverse Engineering Pe Format
25 pages
Minecraft - Wikipedia, The Free Encyclopedia
0% (1)
Minecraft - Wikipedia, The Free Encyclopedia
26 pages
IoT Lab Rtu
No ratings yet
IoT Lab Rtu
74 pages
Unit 12
No ratings yet
Unit 12
23 pages
CN Answers For Model Paper
No ratings yet
CN Answers For Model Paper
51 pages
IxVM Reference Guide
No ratings yet
IxVM Reference Guide
124 pages
1_bac_apology_wrsh[1]
No ratings yet
1_bac_apology_wrsh[1]
1 page
Tourist Monitoring System of Hulugan Falls
No ratings yet
Tourist Monitoring System of Hulugan Falls
132 pages
AI Notes Module 1
No ratings yet
AI Notes Module 1
14 pages
Reci Pro Seto Ohtsuka 2022
No ratings yet
Reci Pro Seto Ohtsuka 2022
15 pages
Cms-830-03-Gl-00002-Rev 0
No ratings yet
Cms-830-03-Gl-00002-Rev 0
5 pages
PGA How To Find Where The Memory Is Growing For A Process (Doc ID 822527.1)
No ratings yet
PGA How To Find Where The Memory Is Growing For A Process (Doc ID 822527.1)
6 pages
., - Computer Applications
No ratings yet
., - Computer Applications
64 pages
week 5 quiz
No ratings yet
week 5 quiz
3 pages
Boundary Value Analysis
No ratings yet
Boundary Value Analysis
5 pages

A Novel Statistical Analysis and Autoencoder Driven (CB)

Uploaded by

A Novel Statistical Analysis and Autoencoder Driven (CB)

Uploaded by

A Novel Statistical Analysis and Autoencoder Driven Intelligent Intrusion Detection Approach Communicated by Dr.

A Novel Statistical Analysis and Autoencoder Driven Intelligent

Cosimo Ieracitano, Ahsan Adeel, Francesco Carlo Morabito,

To appear in: Neurocomputing

Received date: 5 March 2019

© 2019 Published by Elsevier B.V.

Cosimo Ieracitanoa,∗, Ahsan Adeelb , Francesco Carlo Morabitoa , Amir

Internet-connected services are rapidly increasing over years. Indeed, it is

Preprint submitted to Neurocomputing November 13, 2019

• development of innovative IDS based on data analytics and DL technolo-

3. Material and methodology

Table 1: Details of the NSL-KDD dataset

NSL-KDD Total Normal Dos Probe R2L U2R

3.3.1. Outliers analysis

No. Features Types No Features Types

M ADE = P ∗ med(zf j − |med(zf j )|) (1)

zf j > p ∗ M ADE (2)

3.3.2. Data Normalization

Attack class Attack profile

Figure 1: Scheme of the proposed framework. It consists of a pre-processing, a feature ex-

NSL-KDD? Total Normal Dos Probe R2L U2R

3.4. Feature extraction

Figure 3: Autoencoder standard configuration. The AE consists of two stages: encoding

3.5.2. Proposed AE Architecture

3.5.3. Long Short-Term Memory

Figure 5: Architecture of a standard LSTM unit.

3.5.4. Proposed LSTM Architecture

3.5.5. Conventional Classifiers

• MLP classifier: MLP is a feed-forward neural network and uses supervising

• SVM classifier: SVM technique is based on a statistical learning theory

• DA classifier: DA is a statistical method typically used in machine learn-

The performance of the proposed classifiers is measured with traditional

Method Binary-classification Multi-classification

The present work introduces an innovative IDS based on a statistically driven

by using the NSL-KDD dataset as benchmark. The most significant features

[1] H. Sundmaeker, P. Guillemin, P. Friess, S. Woelfflé, Vision and challenges

[5] C. Ieracitano, N. Mammone, A. Bramanti, A. Hussain, F. C. Morabito, A

[8] M. Mahmud, M. S. Kaiser, A. Hussain, S. Vassanelli, Applications of deep

[11] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-

[42] X. Yang, K. Huang, R. Zhang, J. Y. Goulermas, A. Hussain, A new two-

You might also like