0% found this document useful (0 votes)
105 views

A Novel Statistical Analysis and Autoencoder Driven (CB)

research paper

Uploaded by

Elanor El
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views

A Novel Statistical Analysis and Autoencoder Driven (CB)

research paper

Uploaded by

Elanor El
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

A Novel Statistical Analysis and Autoencoder Driven Intelligent Intrusion Detection Approach Communicated by Dr.

Nianyin Zeng

Journal Pre-proof

A Novel Statistical Analysis and Autoencoder Driven Intelligent


Intrusion Detection Approach

Cosimo Ieracitano, Ahsan Adeel, Francesco Carlo Morabito,


Amir Hussain

PII: S0925-2312(19)31575-9
DOI: https://doi.org/10.1016/j.neucom.2019.11.016
Reference: NEUCOM 21521

To appear in: Neurocomputing

Received date: 5 March 2019


Revised date: 30 September 2019
Accepted date: 7 November 2019

Please cite this article as: Cosimo Ieracitano, Ahsan Adeel, Francesco Carlo Morabito, Amir Hussain,
A Novel Statistical Analysis and Autoencoder Driven Intelligent Intrusion Detection Approach, Neuro-
computing (2019), doi: https://doi.org/10.1016/j.neucom.2019.11.016

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition
of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of
record. This version will undergo additional copyediting, typesetting and review before it is published
in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal
disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier B.V.


A Novel Statistical Analysis and Autoencoder Driven
Intelligent Intrusion Detection Approach

Cosimo Ieracitanoa,∗, Ahsan Adeelb , Francesco Carlo Morabitoa , Amir


Hussainc
a DICEAM, University Mediterranea of Reggio Calabria, Via Graziella, Feo di Vito, 89060
Reggio Calabria (Italy)
b School of Mathematics and Computer Science, University of Wolverhampton, Edinburgh

EH16 5XW, UK
c School of Computing, Edinburgh Napier University, Edinburgh EH10 5DT, UK

Abstract
In the current digital era, one of the most critical and challenging issue is ensur-
ing cybersecurity in information technology (IT) infrastructures. Indeed, with
the significant improvement of technology, hackers have been developing ever
more complex and dangerous malware attacks that make the intrusion recog-
nition a very difficult task. In this context, the existing traditional analytic
tools are facing severe challenges to detect and mitigate these threats. In this
work, we introduce a statistical analysis and autoencoder (AE) driven intelli-
gent intrusion detection (IDS) system. Specifically, the proposed IDS combines
data analytics, statistical techniques with recent advances in machine learning
theory to extract optimized and more correlated features. The validity of the
proposed IDS is tested using the benchmark NSL-KDD database. Experimental
results show that the designed IDS achieves better classification performance as
compared to deep and conventional shallow machine learning as well as recently
proposed state-of-the-art techniques.
Keywords: Anomaly Detection, Deep Learning, Autoencoder, Optimized
Features Extraction, NSL-KDD database.

1. Introduction

Internet-connected services are rapidly increasing over years. Indeed, it is


estimated that fifty billion devices will be connected to the Internet by 2020
[1]. However, despite the resulting benefits, information and communication
technology systems are continuously exposed to cyber risks. Malware attacks

∗ Corresponding author
Email addresses: [email protected] (Cosimo Ieracitano),
[email protected] (Ahsan Adeel), [email protected] (Francesco Carlo Morabito),
[email protected] (Amir Hussain)

Preprint submitted to Neurocomputing November 13, 2019


have become more complex and difficult to detect and can cause significant
economic and social consequences. Annually billions of dollars are lost due to
breaches of IT services and this value is expected to grow in the next years
[2]. As a result, cybersecurity has become a priority issue in modern society.
To this end, monitoring and analysing network traffic data is of fundamental
importance to detect potential attack patterns. In this context, in order to
develop even more intelligent Intrusion Detection Systems (IDS) able to pre-
vent malicious threats and ensure improved cybersecurity, companies and IT
enterprises wordwilde have been investing on the so called data science. Such
concept includes a set of techniques coming from computing, statistics and in-
formation technology, as Machine Learning (ML). However, the large amount
of heterogeneous big data generated by different sources, make traditional data
analytics and ML approaches ineffective and inadequate in directly handling
such security challenge. Notably, conventional ML techniques suffer from lim-
ited computational complexity and remain deficient to learn complex non-linear
relationship which exists within the big dataset. Hence, in order to overcome the
aforementioned limitations, and consequently enhance the intrusion detection
perfomance, here, we combine, for the first time, traditional data analysis and
statistical techniques with recent advances in ML. Specifically, Deep Learning
(DL, [3]) technology is employed to develop a more sophisticated security IDS.
Indeed, deep algorithms are able to extract high levels of abstraction automat-
ically from input representations through a hierarchical learning process. DL
has achieved interesting results in various real-world research domains (such as
bioengineering [4], [5], [6], [7], [8], sentiment analysis [9], [10], image recognition
[11], [12], [13] [14], saliency detection [15]) and, recently, it has been employed
to develop also advanced intrusion detection systems [16].
In this paper, a statistically analysis and AE driven intelligent IDS is proposed
to identify normal and abnormal network events. The NSL-KDD dataset from
the Canadian Institute for Cybersecurity (updated version of the original KDD
Cup 1999 Data (KDD99) [17]) is used as benchmark to evaluate the proposed
DL driven IDS. Specifically, the framework here introduced includes three main
processing units: 1) data preprocessing module, 2) feature extraction module,
3) classification module. As regards the data preprocessing module, it removes
the outliers, scales the features in the range [0-1] and transforms categorical fea-
tures into numerical values using the one-hot-encoding technique. The feature
extraction module extracts more correlated variables, removing those features
with null values greater than 80%. In classification module, instead, an autoen-
coder based deep classifier is proposed to discriminate different groups of the
NSL-KDD dataset. Such categories include both binary and multi-classification.
The binary classes include: Normal and Abnormal (DoS, R2L, Probe attack cat-
egories), whereas classes for multi-classification include: Normal, DoS, R2L, and
Probe. The performance of the AE classifier is compared with deep (Long Short-
Term Memory, LSTM) and conventional shallow classifiers (Multi Layer Percep-
tron (MLP), Linear-Support Vector Machine (L-SVM), Quadratic-Support Vec-
tor Machine (Q-SVM), Discriminant Analysis with linear (LDA) and quadratic
discrimination function (QDA)) as well as existing deep learning methods. As

2
can be seen in Table 5 and 8 the proposed AE processor outperformed all other
approaches, reporting accuracy value of 84.21% and 87% in binary and multi-
classification, respectively.
The main contributions of this work can be summarize as follow:

• development of innovative IDS based on data analytics and DL technolo-


gies;
• development of a IDS able to effectively discriminate different cyber-attack
classes of the NLS-KDD dataset with very good accuracy;
• development of a IDS with significant potential for exploitation in indus-
trial applications.

The present paper is organized as follows. Section 2 presents the related works,
in particular recently proposed DL approaches based on the NSL-KDD dataset.
Section 2 introduces the used NSL-KDD dataset and explains the proposed
method, including data preprocessing, features extraction and the developed
classifiers. Section 4 reports the experimental results. Finally, Section 5 and 6
conclude this work.

2. Related work

In the literature there are several intrusion detection systems that use the
KDD99 and NSL-KDD datasets to measure the performance and effectiveness
of the proposed models. For example, Ingre et al. [18] developed a 3-layers
MLP to detect attacks classes of the NSL-KDD dataset, achieving an accuracy
of 79.9% for multi-classification and 81.2% for binary classification on test set.
Ibrahim et al. [19] proposed a novel method based on self-organizing maps
(SOMs) for binary classification and reported detection accuracy up to 75.49%
on NSL-KDD test dataset. Similarly, Mohamed et al. [20] applied conven-
tional learning techniques such as MLP and achieved accuracy up to 95.7% for
binary classification. However, the authors partitioned the dataset into k=10
folds. Gao et al. [21] developed a semi-supervised learning method based on
fuzzy and ensemble learning theory. The authors used the NSL-KDD dataset,
reporting accuracy of 84.54% on the KDD test set. Alrawashdeh et al. [22] im-
plemented a deep belief network (DBN) based on Restricted Boltzmann Machine
(RBM) architecture with softmax output layer for multi-classification purpose.
However, the proposed system was tested on 10% of the KDD99 test samples,
achieving accuracy and false alarm rate up to 98% and 2.47%, respectively. In
[23], the authors used the Software Defined Networking (SDN) environment and
proposed a deep neural network (DNN) for anomalies detection. Specifically,
a three-hidden layers neural network was trained on the NSL-KDD dataset.
However, just 6 features were used and only two-way discrimination was per-
formed (normal vs. abnormal). Experimental results reported performance of
75.75% accuracy. In [24], instead, Kim et al. proposed a deep neural network
trained on the KDD99 dataset. The DNN consisted of four hidden layers, 100

3
hidden neurons, trained with the adaptive moment estimation (Adam) method.
The authors claimed very good performances, but they used different subsets
of the original KDD99 dataset. Yan et al. [25] proposed a stacked sparse au-
toencoder (SSAE) to detect the categories of NSL-KDD dataset. The authors
claimed accuracy up to 98.63%, but they simplified the experimental process by
shuffling and reassembling the original data into several independent datasets.
Similarly, Xu et al. [26] developed an IDS based on deep neural networks to
classify samples of NSL-KDD dataset, achieving high performances. However,
they evaluated the effectiveness of the proposed model by performing the 10-fold
cross-validation method on the original NSL-KDD data. Imamverdiyev et al.
[27] developed three DL (Bernoulli-Bernoulli RBM, GaussianBernoulli RBM,
DBN) and three standard machine learning (SVM (radial basis), SVM (epsilon-
SVR), and decision tree) architectures. Experimental results reported that the
Bernoulli-Bernoulli RBM outperformed all other approches with accuracy rate
of 73.23 %. Javaid et al. [28] used sparse AE architectures and self-taught
learning (STL) for detecting anomalies of NSL-KDD dataset. The accuracy
value was 79.1% in multiclass classification. Yousefi-Azar et al. [29] developed
a Recurrent Neural Network (RNN) based system for intrusion detection. The
authors used the NSL-KDD dataset as benchmark and performed both binary
and multi-classification, achieving accuracy rates of 83.3% and 81.3%, respec-
tively. Recently, Shone et al. [30] implemented a stacked non-symmetric deep
autoencoder (SNDAE) architecture for cyber attacks detection. In this study,
the authors used the NSL-KDD dataset, reporting multi-classification perfor-
mances of 85.42%.

3. Material and methodology

In this Section, firstly, the NSL-KDD dataset used in this work is introduced.
Then, the proposed methodology, (including pre-processing, feature extraction
and classification) is described.
3.1. NSL-KDD database description
The NSL-KDD is a subset of the original KDD99 dataset and it is widely used
as benchmark in several intrusion detection systems. Indeed, NSL-KDD solves
some criticisms of the previous KDD99, such as the redundant and duplicate
records in train and test set that biased the classifiers toward more frequent
samples. NSL-KDD is a dataset made freely available by the Canadian Insti-
tute of Cybersecurity [31]. It has training and testing data sets, here denoted
as KDDT rain+ and KDDT est+ , which include 125973 and 22544 instances, re-
spectively. Specifically, since the KDDT est+ dataset had seventeen additional
attacks types,which were not included in the KDDT rain+ , for a fair classifica-
tion, the instances corresponding to such categories (3751) were removed. As a
result, the KDDT est+ was composed of 22544 - 3751 = 18793 examples. Fur-
ther details on the KDDT rain+ and KDDT est+ sets are reported in Table 1.
NSL-KDD dataset has zf (f=1,2,..41) features: 38 continuous and 3 symbolic,

4
as shown in Table 2. Furthermore, the attack types of the NSL-KDD dataset
are clustered into four different attack classes:
1. DoS (Denial of Service): DoS include attacks that cause the slowing or
shutting down of a machine by sending more traffic information to the
server than the system is able to handle. DoS attacks affect legitimate
network traffic or access to services.
2. R2L (Root to Local): R2L include attacks which provide illegal local
access to a machine by sending remote deceiving packets to the system.
3. U2R (User to Root): U2R include attacks which provide root access. In
this case, the hacker finds out the system vulnerability and starts using
the system as normal user.
4. Probe (Probing): Probe include attacks able to avoid the security control
systems by gathering information about the network.
The attack categories of the NSL-KDD dataset are reported in Tables 3.

Table 1: Details of the NSL-KDD dataset

NSL-KDD Total Normal Dos Probe R2L U2R


KDDT rain+ 125973 67343 45927 11656 995 52
KDDT est+ 18793 9710 5741 1106 2199 37

3.2. Methodology
Figure 1 shows the procedure of the proposed methodology. Firstly, the NSL-
KDD dataset is cleaned from outliers and min-max normalization technique is
used to scale data within the range 0 and 1. Afterwards, the one-hot-encoding is
applied to convert symbolic (or categorical) features into numeric values. Then,
the 38 numeric attributes are analyzed statistically in order to select the most
correlated features. Finally, shallow (MLP, L-SVM, Q-SVM, LDA, QDA) and
deep (AE, LSTM) networks are developed to measure the detection performance
both in binary and multi-classification scenario.
3.3. Data preprocessing
The proposed preprocessing stage arranges data to be proccessed by the next
modules properly. It includes three units: outliers analysis , data normalization
and one-hot encoding.

3.3.1. Outliers analysis


The NSL-KDD dataset is filtered from inconsistent values (outliers) as it has
proven to be an important operation before data normalization. Indeed, outliers
can interfere with the learning process causing miss detection in the proposed
intrusion systems. Here, the outliers are identified using the Median Absolute
Deviation Estimator (MADE), defined as follow:

5
Table 2: Features of NSL-KDD dataset: 38 numeric (or continuous, cont) and 3 categorical
(or symbolic, symb).

No. Features Types No Features Types


z1 duration cont z22 is guest login cont
z2 protocol type symb z23 count cont
z3 service symb z24 srv count cont
z4 flag symb z25 serror rate cont
z5 source bytes cont z26 srv serror rate cont
z6 destination bytes cont z27 rerror rate cont
z7 land cont z28 srv rerror rate cont
z8 wrong fragment cont z29 same srv rate cont
z9 urgent cont z30 diff srv rate cont
z10 hot cont z31 srv diff host rate cont
z11 num failed logins cont z32 dst host count cont
z12 logged in cont z33 dst host srv count cont
z13 num compromised cont z34 dst host same srv rate cont
z14 root shell cont z35 dst host diff srv rate cont
z15 su attempted cont z36 dst host same src port rate cont
z16 num root cont z37 dst host srv diff host rate cont
z17 num file creations cont z38 dst host serror rate cont
z18 num shells cont z39 dst host srv serror rate cont
z19 num access files cont z40 dst host rerror rate cont
z20 Num outbound cmds cont z41 dst-host srv rerror rate cont
z21 is host login cont

M ADE = P ∗ med(zf j − |med(zf j )|) (1)


where med is the median operator, zf j denotes the sample of the attribute zf ,
whereas P =1.4826 represents a multiplicative constant typically used under the
assumption of data normality. In this study, zf j was an outlier if

zf j > p ∗ M ADE (2)


with p=10. Removing the outliers, the KDDT rain+ size changed from 125973 to
85421, whereas, KDDT est+ size from 18793 to 11925. Table 4 reports the new
dataset (denoted as NSL-KDD? ) after eliminating outliers. It worth mentioning
that the dataset is significantly unbalanced. Indeed, it includes only 18 test
instances in the U2R attack category. Hence, the U2R class was removed from
the final dataset.

3.3.2. Data Normalization


The min-max normalization method was used to map the numeric features
values zij into the numeric range [0-1], according to:

zf j − min(zf )
z̃f j = (3)
max(zf ) − min(zf )

6
Table 3: Attack profiles of DoS, R2L, U2R, Probe classes.

Attack class Attack profile


DoS back, neptune, land, smurf, pod, teardrop
ftp write, imap, guess passwd,
R2L
multihop, phf, warezclient, spy, warezmaster
U2R loadmodule, buffer overflow, rootkit, perl
Probe nmap, ipsweep, satan, portsweep

NSL–KDD

KDDTrain+ KDDTest+

CLASSIFICATION
Normal
PRE-PROCESSING AE Dos
R2L
classifier Probe
Outlier analysis
Normal
LSTM Dos
Normalization classifier R2L
Probe

Normal
One-hot-encoding MLP Dos
R2L
classifier Probe
Normal
FEATURE EXTRACTION L-SVM Dos
R2L
classifier Probe

Normal
Q-SVM Dos
R2L
classifier Probe

Normal

102-dimentional
LDA Dos
R2L
features vector classifier Probe
Normal
QDA Dos
R2L
classifier Probe

Figure 1: Scheme of the proposed framework. It consists of a pre-processing, a feature ex-


traction and a classification module. As an example, in the figure, the classifiers refer to
multi-classification task.

where max(zf ) and min(zf ) represent the maximum and minimum values of
the f th (numeric) feature zf ; whereas z̃f j is the normalized feature value ranged
between [0-1].

3.3.3. One-hot-encoding
The three categorical features protocol type, service, flag (z2 , z3 , z4 , respectively)
were transformed into numerical values using the one-hot-encoding technique.
Specifically, each categorical attribute is represented by binary values. For ex-
ample, the z2 feature (protocol type) has three attributes: tcp, udp and icmp.
Applying the one-hot-encoding technique they were converted into binary vec-
tors: [1,0,0],[0,1,0],[0,0,1], respectively. Similarly, also z3 and z4 features (ser-
vice and flag) were converted into one-hot-encoding vectors. Overall, the 41-
dimensional features were mapped into 122-dimensional features (38 continuous
and 84 with binary values related to the features z2 , z3 , z4 ).

7
Table 4: NSL-KDD? dataset after discarding the outliers.

NSL-KDD? Total Normal Dos Probe R2L U2R


KDD?T rain+ 85421 51551 23272 9683 874 41
KDD?T est+ 11925 7341 1975 620 1971 18

3.4. Feature extraction


This processing module extracts the most correlated features. For each con-
tinuous feature, the percentage of zeros is evaluated both for KDD?T rain+ and
KDD?T est+ set. Figure 2 illustrates the distribution of null values of each nu-
meric variable in the KDD?T rain+ set. In this work, feature vectors with number
of zeros higher than 80% are excluded from subsequent elaborations. Specifi-
cally, 20 variables (indicated in red colour in Figure 2) are discarded; whereas,
the remaining 18 continuous features are combined with 84 one-hot-encoding
vectors for a grand total of 102-dimensional features vector. Such vector is the
input of the proposed shallow (MLP, L-SVM, Q-SVM, LDA, QDA) and deep
(AE, LSTM) classifiers.

Figure 2: Histogram of null values included in the 38 numeric variables of the KDD?T rain+
set. Features with zeros greater than 80% are represented in red colour and are removed from
the present study.

3.5. Classification
Two deep architectures based on AE, LSTM and three shallow architectures
based on standard MLP, SVM (with linear and quadratic kernel) and DA (with
linear and quadratic discriminant function) are developed to detect normal and
abnormal categories of the NSL-KDD dataset (Normal, DoS, R2L and Probe).
Details are presented in the following subsections.

8
3.5.1. Autoencoder
In this study, a deep classifier based on an AE was developed. An AE architec-
ture consists of an encoder and decoder operation: first, it transforms the input
data vector into a typically lower representation (encoder); then, it attempts to
reconstruct the original input from the compressed vector (decoder). The AE is
trained in unsupervised fashion and is able to capture significant features from
unlabeled data [32]. Figure 3 shows a classic AE model with a single hidden
layer. The input data vector z is encoded into a lower representation e:

e = ς(zW + b) (4)
where W represents the weight matrix, b is the bias vector and ς denotes the ac-
tivation function for the encoder. Afterwords, the decoding operation produces
the reconstruction of the input (z ) from the encoded representation (e):

z̃ = ζ(eW T + b) (5)
where ζ denotes the activation function for the decoder and z̃ is the recon-
structed vector.

Figure 3: Autoencoder standard configuration. The AE consists of two stages: encoding


and decoding operation. The encoding operation maps z into a compressed representation e;
whereas the decoding operation attempts to reconstruct z from e, so that z̃ ≈ z.

3.5.2. Proposed AE Architecture


Figure 4 (a) shows the proposed autoencoder framework (AE[102:50:102]). Specif-
ically, the AE encodes the 102-dimensional features representation (z ) into a
50-dimensional vector (e) and then decodes it back to the same input features
space. In this study, the AE[102:50:102] is trained in unsupervised manner,
using the scaled conjugate gradient method (SCG), for 102 epochs. Moreover,
the saturating linear activation function ς(s) is adopted in the compression and

9
reconstruction operations: ς(s) = 0 when s ≤ 0, s when 0 < s < 1, 0 when
s≥1.
The reconstruction error between z and z̃ is quantified using the mean squared
error (MSE) parameter. It worth mentioning that the minimum error was de-
tected with 50 hidden neurons and was of 0.0083. After training the AE[102:50:102],
the 50 latent features are used as input of a dense fully connected layer with
softmax activation function (AE50 classifier, Figure 4 (b)). At this stage, the
softmax layer is trained in supervised fashion for binary or multi-classification
purposes. Then, the fine-tuning approach is used. Specifically, the whole archi-
tecture, shown in Figure 4 (b), is re-trained with supervised learning algorithm
in order to improve the classification performances. The AE50 classifier was de-
veloped by using MATLAB R2018a (The MathWorks, Inc., Natick, MA, USA)
and trained until the cross-entropy loss function [33] converges, that is for 3*102
epochs.

Figure 4: (a) AE based classifier. The AE [102:50:102] reduces the 102-dimensional features
vector (z ) into 50 most latent features (e) and then reconstructs the original input from
the 50 compressed features. (b) Afterwards, the feature vector e is used as input of a final
softmax layer (o) for binary or multi-classification. Finally, the whole structure is fine-tuned
using conventional back propagation algorithm. In the figure, the AE classifier referred to
multi-class detection.

3.5.3. Long Short-Term Memory


In order to compare the AE classifier with other deep learning architectures,
a Long short-term memory (LSTM) based model was developed. LSTM are
units known as memory blocks of a recurrent neural network (RNN) [34]. Figure
5 shows a standard LSTM architecture. It includes a cell (g), an input gate (i),
an output gate (o) and a forget gate (l). A layer of LSTM units is able to learn

10
long-term dependencies between time steps in a sequence of data. Such LSTM
layer has two states: the hidden state (or the output state) that contains the
output at the time step t; and the cell state, that stores the information learned
from the previous time steps. At each time step t the hidden and cell state are
updated by using the aforementioned gates:

ct = lt ct−1 + it gt (6)

ht = ot tanh(ct ) (7)
where:
it = σg (Wi z + Ri ht−1 + bi ) (8)
ft = σg (Wf z + Rf ht−1 + bf ) (9)
gt = tanh(Wg z + Rg ht−1 + bg ) (10)
ot = σg (Wo z + Ro ht−1 + bo ) (11)
and where σ represents the sigmoid activation function, W is the weights
matrix, R is the recurrent weights matrix and b the bias.

Figure 5: Architecture of a standard LSTM unit.

3.5.4. Proposed LSTM Architecture


Figure 6 shows the designed LSTM classifier. It consists of input layer,
one LSTM layer and output dense layer. Specifically, for fair comparison, the
LSTM layer used 50 cells for encoding the input information. Then, the output
was feeded into the dense fully connected layer which has 2 or 4 neurons (with
softmax activation function) for binary and multi-classification, respectively.
The input at layer k is the hidden state value evaluated by layer k − 1. In
this study, the LSTM model was trained using the Adam (adaptive moment
estimation) optimizer, learning rate of 0.01 and mini-batch size of 128. These
learning parameters have chosen empirically after several experimental tests.

11
Figure 6: LSTM architecture. Similarly to the AE architecture it consists of one hidden layer
of 50 units followed by a softmax layer for binary and multi-classification. The architecture
in figure referred to multi-classification task.

3.5.5. Conventional Classifiers


In order to compare the proposed deep AE classifier also with conventional
techniques, the following shallow classifiers were developed:

• MLP classifier: MLP is a feed-forward neural network and uses supervising


learning algorithm for training [35]. Figure 7 illustrates the proposed
shallow MLP classifier. It is to be noted that, for fair comparison, MLP
and AE architectures have the same structure. Indeed, the MLP classifier
consists of one hidden layer with 50 neurons and a softmax output layer
for classification tasks.

• SVM classifier: SVM technique is based on a statistical learning theory


[36]. SVM finds the best hyperplane that provides the maximum sepa-
ration between classes. In this study, a SVM classifier with linear kernel
(L-SVM) and a SVM classifier with quadratic (Q-SVM) kernel are imple-
mented. Detailed mathematical formulation of SVM is reported in [37].

• DA classifier: DA is a statistical method typically used in machine learn-


ing. The goal of DA is to reduce the dimensionality and keep good sep-
arability among classes. Specifically, it projects the data samples onto a
lower-dimensional space so that the class-separability is maximum and the
dispersion of the samples belonging to the same class is minimum. In this
study, a discriminant classifier with linear function (LDA) and a discrimi-
nant classifier with quadratic function (QDA) are implemented. Detailed
mathematical formulation of DA techniques is reported in [38].

12
Figure 7: MLP architecture. Similarly to the AE architecture it consists of one hidden layer
of 50 units followed by a softmax layer for binary and multi-classification. The architecture
in figure referred to multi-classification task.

4. Experimental results

The performance of the proposed classifiers is measured with traditional


metrics: precision, recall, F1 score (or F-measure) and accuracy:

TP
P recision = (12)
TP + FP
TP
Recall = (13)
TP + FN
P recision ∗ Recall
F 1score = 2 ∗ (14)
P recision + Recall
TP + TN
Accuracy = (15)
TP + FP + TN + FN
where TP (True Positive) is the number of instances correctly detected as
anomalous; TN (True Negative) is the number of instances correctly detected
as normal; FP (False Positive) refers to the number of normal traffic patterns
missclassified as anomalous; FN (False negatives) refers to the number of anoma-
lous traffic patterns erroneously identified as normal. In order to estimate
the ability of the classifier to correctly detect normal and abnormal attacks,
the performances of the proposed architectures (AE, LSTM, MLP, L-SVM, Q-
SVM, LDA, QDA) were studied in binary classification (Normal, Abnormal)
and multi-classification (Normal, Dos, Probe, R2L) modality. It is to be noted
that, since F1 score parameter includes precision and recall information (eq.
14), the following considerations are based mainly on this measurement.

13
4.1. Binary classification
Table 6 reports the outcomes of binary classification experiments, where the
abnormal class includes DoS, Probe and R2L categories. The MLP classifier
achieved F1 scores of 86.65% and 70.69% in detecting normal and abnormal
categories, respectively. As regards SVM classifiers, Q-SVM outperformed L-
SVM in terms of average F1 score rate (reporting values of 81.39% and 77.54%,
respectively). Indeed, Q-SVM classifier achieved better performance in detect-
ing both normal and abnormal class (F1 scores of 87.87% and 74.90%, respec-
tively). As regards discriminant analysis, similarly, QDA outperformed LDA
in terms of average F1 score (reporting values of 77.78% and 75.16%, respec-
tively). However, LDA classifier achieved better F-measure values in detecting
normal samples (85.26%), whereas QDA in detecting anomalies (71.66%). As
regards the deep classifiers, the LSTM architecture achieved an average F1 score
of 79.24%. In contrast, the proposed deep AE classifier showed the highest F1
score, achieving an average value of 82.00%. Moreover, the AE based classifier
outperformed the aforementioned methods also in terms of accuracy (Table 5),
with a performance rate up to 84.21% as compared to LSTM, MLP, L-SVM,
Q-SVM, LDA and QDA classifiers which achieved accuracies of 82.04%, 81.65%,
80.8%, 83.15%, 79.27%, 76.84%, respectively. Similar results were achieved also
with the area under the curve (AUC) of the Receiver Operating Curve (ROC)
[39]. Indeed, as can be seen in Figure 8 (a), the AE classifier reported the best
AUC score (AUCAE =95.55%).

4.2. Multi-classification
Table 7 reports the outcomes of multi-classification experiments. Similar to
the binary classification analysis, the shallow MLP, L-SVM, Q-SVM, LDA, QDA
and deep LSTM, AE based classifiers were compared. The simulation results
showed that: the MLP classifier reported F1 score of 87.1%, 97.08%, 77.13% for
Normal, Dos and Probe attack classes, respectively; the Q-SVM outperformed
L-SVM classifier in terms of average F1 score with values of 75.11% and 69.76%,
respectively. Specifically, L-SVM reported F1 scores of 86.55%, 96.69%, 86.22%,
whereas Q-SVM F1 values of 88.32%, 97.41%, 82.81% for Normal, Dos and
Probe attack classes, respectively; the LDA, instead, outperformed QDA classi-
fier in terms of average F1 score with values of 76.49% and 64.36% respectively.
The LDA classifier reported F-measure of 90.69%, 91.14%, 70.87%, whereas the
QDA classifier achieved values of 87.98%, 74.64%, 47.86% for Normal, Dos and
Probe attack classes, respectively. However, it is to be noted that, the MLP, L-
SVM and Q-SVM based classifiers were not able to discriminate the R2L attack
category accurately (reporting F1 score of 11.74%, 9.45% and 31.90%, respec-
tively). The DA classifiers, instead, achieved better results in detecting the R2L
anomaly with F1 score of 53.27% (LDA) and 46.96% (QDA). As regards the deep
classifiers, LSTM architecture achieved an average F-measure rate of 67.17%.
The LSTM classifier reported very good discriminating performance only in
detecting Normal, Dos and Probe attack types (F-measure values of 86.12%,
96.90%, 84.05%, respectively) and remained inadequate in detecting the R2L

14
attack class. As regards, the deep AE classifier, similarly to the binary clas-
sification experiments, outperforms all the other machine learning algorithms,
reporting F1 score rate up to 98%. Furthermore, it is worth mentioning that
the AE classifier outperformed the LSTM and conventional techniques also in
term of accuracy, achieving the highest accuracy up to 87%. In contrast, LSTM,
MLP, L-SVM, Q-SVM, LDA and QDA classifiers reported accuracies of 80.67%,
81.43%, 81.4%, 83.65%, 83.17%, 79.47% respectively. Also in this scenario, this
result was confirmed evaluating the AUC. Indeed, as can be seen in Figure 8
(b), the AE classifier reported the best performance (AUCAE =96.1%).

Table 5: Accuracies of the proposed AE, LSTM, MLP, L-SVM, Q-SVM, LDA, QDA classifier
for binary and multi-classfication.

Method Binary-classification Multi-classification


AE 84.21% 87%
LSTM 82.04% 80.67%
MLP 81.65% 81.43%
L-SVM 80.8% 81.4%
Q-SVM 83.15% 83.65%
LDA 79.27% 83.17%
QDA 76.84% 79.47%

Table 6: Binary classification performance (Precision, Recall, F1 score) of AE, LSTM, MLP,
L-SVM, Q-SVM, LDA, QDA classifiers.

Precision
Attack class AE LSTM MLP L-SVM Q-SVM LDA QDA
Normal 81.09% 79.00% 78.49% 77.68% 80.84% 75.81% 81.08%
Abnormal 92.91% 91.25% 91.57% 90.96% 91.34% 92.38% 76.34%
AVG 87% 85.13% 85.03% 84.32% 86.09% 84.09% 78.71%
Recall
Attack class AE LSTM MLP L-SVM Q-SVM LDA QDA
Normal 96.96% 96.47% 96.69% 96.55% 96.24% 97.41% 86.94%
Abnormal 63.79% 58.92% 57.57% 55.56% 63.48% 50.22% 67.52%
AVG 80.37% 77.70% 77.13% 76.06% 79.86% 73.81% 77.23%
F1 score
Attack class AE LSTM MLP L-SVM Q-SVM LDA QDA
Normal 88.32% 86.86% 86.65% 86.09% 87.87% 85.26% 83.91%
Abnormal 75.64% 71.61% 70.69% 68.99% 74.90% 65.07% 71.66%
AVG 81.98% 79.24% 78.67% 77.54% 81.39% 75.16% 77.78%

5. Discussion

The present work introduces an innovative IDS based on a statistically driven


deep AE. For experiments and analysis, the benchmark NSL-KDD dataset was

15
Table 7: Multi-classification performance (Precision, Recall, F1 score) of AE, LSTM, MLP,
L-SVM, Q-SVM, LDA, QDA classifiers.

Precision
Attack class AE LSTM MLP L-SVM Q-SVM LDA QDA
Normal 85.03% 77.70% 79.46% 78.37% 81.35% 88.01% 84.28%
Dos 97.05% 96.39% 97.21% 96.52% 97.71% 95.45% 97.79%
Probe 69.82% 75.26% 64.33% 77.03% 71.15% 60.20% 31.62%
R2L 99.49% 80.00% 99.19% 95.15% 98.68% 79.62% 99.34%
AVG 87.85% 82.34% 85.05% 86.77% 87.22% 80.82% 78.26%
Recall
Attack class AE LSTM MLP L-SVM Q-SVM LDA QDA
Normal 96.19% 96.58% 96.35% 96.64% 96.59% 93.53% 92.02%
Dos 98.18% 97.42% 96.96% 96.86% 97.11% 87.19% 60.35%
Probe 94.03% 95.16% 96.29% 97.90% 99.03% 86.13% 98.39%
R2L 39.78% 0.81% 6.24% 4.97% 19.03% 40.03% 30.75%
AVG 82.04% 72.49% 73.96% 74.09% 77.94% 76.72% 70.38%
F1 score
Attack class AE LSTM MLP L-SVM Q-SVM LDA QDA
Normal 90.27% 86.12% 87.10% 86.55% 88.32% 90.69% 87.98%
Dos 97.61% 96.90% 97.08% 96.69% 97.41% 91.14% 74.64%
Probe 80.14% 84.05% 77.13% 86.22% 82.81% 70.87% 47.86%
R2L 56.83% 1.61% 11.74% 9.45% 31.90% 53.27% 46.96%
AVG 81.21% 67.17% 68.26% 69.73% 75.11% 76.49% 64.36%

employed. The strengths and effectiveness of the proposed IDS were evaluated
using standard measurements including precision, recall, F1 score and accu-
racy. The most correlated features were extracted through statistical analysis
which were used as input to the deep (AE, LSTM) and shallow ML approaches
including MLP, L-SVM, Q-SVM, LDA and QDA. Moreover, both the binary
(Normal vs Abnomal) and multi-classification (Normal vs Dos vs R2L vs Probe)
were performed. As regards the shallow ML architectures, experimental results
showed that the Q-SVM classifier achieved the best performances for both bi-
nary (83.15% accuracy) and multi-class discrimination (83.65% accuracy), as
compared with MLP, L-SVM, LDA, QDA classifiers. As regards the deep ML
architectures, the AE classifier achieved the highest performances for both bi-
nary (84.21% accuracy) and multi-class discrimination (87% accuracy), as com-
pared with LSTM classifier. Hence, as can be observed, comparative results
with shallow and deep classifiers showed that the deep autoencoder archite-
cuture outperformed the other ML approaches proposed. Furthermore, this
result was confirmed also in terms of AUC: 95.65% and 96.1% for binary and
multi classification, respectively. It is to be noted that the optimal AE struc-
ture was found by estimating the performance of different numbers of hidden
layers (HL) and hidden units. Specifically, Table 9 reports the binary and
multi-classification accuracies of different AE architectures. As can be seen, the
minimum classification perfomance of 79.56% was obtained with an AE40,25,12
(in the binary anomaly detection scenario) and of 69.04% with AE50,30,12 (in
the multi-class anomaly detection scenario). However, the highest performance

16
was achieved by the proposed AE50 : 84.21% for binary-classification and 87%
for multi-classification, respectively. For fair comparison, shallow and deep net-
works have been developed with the same structure. Indeed, both MLP and
LSTM classifier consisted of 50 hidden units. Furthermore, the proposed deep
AE was compared also with most recent approaches in the literature that used
NSL-KDD dataset. Since most of the existing works focused on discriminat-
ing NSL-KDD attack types, we compared the performance of AE classifier in
multi-classification modality.
Recently, the authors in [40] proposed a hardware-software co-design ma-
chine learning accelerator based on sequential learning algorithm, achieving ac-
curacy up to 76.04% and training time of 144.5 s. Similarly, the authors in [28],
proposed a sparse AE architecture reporting accuracy of 79.10%; whereas in
[27] the authors designed a Gaussian-Bernoulli RBM consisted of 7 layers of 100
neurons, achieving accuracy rate up to 73.23%. A stacked non-symmetric deep
AE was developed in [30]. Specifically, they proposed a 3-hidden layers AE com-
bined with a Random Forest classifier, achieving multi-classification accuracy
rate up to 85.42% and minimum training time of 644.84 s. In [29], the authors
modelled a RNN-IDS with 80 hidden untis, reporting multiclass accuracy of
81.29% and a training time of 11444 s.
In contrast to these approaches, we proposed a statistical analysis and driven
intelligent AE classifier that achieved multiclass accuracy up to and 87%. How-
ever, it is to be noted that, although the proposed IDS outperfomed the aforeme-
tioned works, a difference of about 4% in accuracy was observed when compared
with Shone et al. [30]. Nevertheless, the AE developed here had a very simple
architecture with only 1 hidden layer with just 50 hidden units. Consequently,
the proposed IDS was optimized in terms of number of learning parameters and
time. Indeed, the training process, executed on high perfomance GPU GeForce
RTX 2080 Ti installed on an Intel(R) Core(TM) i7-8000K CPU processor with
64 GB RAM, was only 22.53s.
Table 8: Performance of the proposed IDS with the recent state-of-the-art techniques.

Method Accuracy
Proposed AE 87%
Proposed LSTM 80.67%
Imamverdiyev et al. [27] 73.23%
Huang et al. [40] 76.04%
Javaid et al. [28] 79.10%
Yin et al. [29] 81.29%
Shone et al. [30] 85.42%

6. Conclusion
In this paper the authors presented a novel statistical analysis and autoencoder
driven intelligent intrusion detection approach. The proposed IDS was tested

17
Table 9: Evaluation performance of AE with different hidden layers (HL).

Accuracy Accuracy
Classifier HL1 HL2 HL3
Binary Classification Multi-Classification
AE40 40 - - 80.87% 78.12%
AE40,20 40 20 - 80.65% 77.17%
AE40,20,12 40 20 12 79.97% 77.83%
AE40,25 40 25 - 80.28% 79.36%
AE40,25,12 40 25 12 79.56% 78.23%
AE40,30 40 30 - 80.48% 79,00%
AE40,30,12 40 30 12 79.84% 76.51%
AE50 50 - - 84.24% 87%
AE50,20 50 20 - 81.07% 78.62%
AE50,20,12 50 20 12 80.77% 75.70%
AE50,25 50 25 - 82.03% 81.84%
AE50,25,12 50 25 12 81.36% 80.65%
AE50,30 50 30 - 81.42% 81.13%
AE50,30,12 50 30 12 80.84% 69.04%
AE60 60 - - 80.26% 79.82%
AE60,20 60 20 - 80.49% 79.23%
AE60,20,12 60 20 12 79.94% 77.51%
AE60,25 60 25 - 81.28% 79.26%
AE60,25,12 60 25 12 81.18% 78.68%
AE60,30 60 30 - 80.48% 79.27%
AE60,30,12 60 30 12 80.24% 74.50%

by using the NSL-KDD dataset as benchmark. The most significant features


extracted via a data driven deep learning method were fed into the AE archi-
tecture consisted of a single hidden layer of 50 units (AE50 ). The proposed
AE50 classifier was compared with deep and traditional algorithms (Table 5)
as well as recent state-of-the-art techniques (Table 8). Comparative results re-
ported that the AE50 classifier achieved higher performance than all the other
methods (84.21% accuracy in binary classification (Normal, Abnormal) and 87%
accuracy in multi-classification (Normal, Dos, R2L, Probe, U2R)). The efficacy
of the proposed AE50 framework was also estimated with the analysis of ROC.
Indeed, the AE50 achieved the highest AUC score in both scenarios (95.65% in
2-way discrimination and 96.1% in 4-way discrimination).
In the future, we intend to develop more accurate deep architectures able to
manage real-time data flows similar to NSL-KDD instances to identify malicious
attacks in real-time. In addition, in order to exploit long-term learning, faster
decisions criteria together with reduced computational complexity for real-time
big data analysis, we will explore the integration of the methods proposed in
[41], [42] with the work here presented.

7. Acknowledgements
This work was funded by the UK EPSRC (Engineering and Physical Sciences
Research Council) grant, code: EP/M026981/1.

18
(a)

(b)

Figure 8: ROC curves of the proposed AE, LSTM, MLP, L-SVM, Q-SVM, LDA, QDA clas-
sifiers for binary (a) and multi-classification (b).

19
References

[1] H. Sundmaeker, P. Guillemin, P. Friess, S. Woelfflé, Vision and challenges


for realising the internet of things, Cluster of European Research Projects
on the Internet of Things, European Commision 3 (3) (2010) 34–36.
[2] S. Goel, K. Williams, E. Dincelli, Got phished? internet security and
human vulnerability, Journal of the Association for Information Systems
18 (1) (2017) 22.
[3] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, nature 521 (7553) (2015)
436.
[4] S. Gasparini, M. Campolo, C. Ieracitano, N. Mammone, E. Ferlazzo,
C. Sueri, G. G. Tripodi, U. Aguglia, F. C. Morabito, Information theoretic-
based interpretation of a deep neural network approach in diagnosing psy-
chogenic non-epileptic seizures, Entropy 20 (2) (2018) 43.

[5] C. Ieracitano, N. Mammone, A. Bramanti, A. Hussain, F. C. Morabito, A


convolutional neural network approach for classification of dementia stages
based on 2d-spectral representation of eeg recordings, Neurocomputing.
[6] N. Zeng, Z. Wang, H. Zhang, W. Liu, F. E. Alsaadi, Deep belief networks
for quantitative analysis of a gold immunochromatographic strip, Cognitive
Computation 8 (4) (2016) 684–692.
[7] N. Zeng, H. Qiu, Z. Wang, W. Liu, H. Zhang, Y. Li, A new switching-
delayed-pso-based optimized svm algorithm for diagnosis of alzheimers dis-
ease, Neurocomputing 320 (2018) 195–202.

[8] M. Mahmud, M. S. Kaiser, A. Hussain, S. Vassanelli, Applications of deep


learning and reinforcement learning to biological data, IEEE transactions
on neural networks and learning systems 29 (6) (2018) 2063–2079.
[9] K. Dashtipour, M. Gogate, A. Adeel, C. Ieracitano, H. Larijani, A. Hussain,
Exploiting deep learning for persian sentiment analysis, in: International
Conference on Brain Inspired Cognitive Systems, Springer, 2018, pp. 597–
604.
[10] Y. Ma, H. Peng, T. Khan, E. Cambria, A. Hussain, Sentic lstm: a hybrid
network for targeted aspect-based sentiment analysis, Cognitive Computa-
tion 10 (4) (2018) 639–650.

[11] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-


scale image recognition, arXiv preprint arXiv:1409.1556.
[12] X. Sun, M. Lv, Facial expression recognition based on a hybrid model
combining deep and shallow features, Cognitive Computation (2019) 1–11.

20
[13] G. Zhong, S. Yan, K. Huang, Y. Cai, J. Dong, Reducing and stretch-
ing deep convolutional activation features for accurate image classification,
Cognitive Computation 10 (1) (2018) 179–186.
[14] N. Zeng, H. Zhang, B. Song, W. Liu, Y. Li, A. M. Dobaie, Facial expression
recognition via learning deep sparse autoencoders, Neurocomputing 273
(2018) 643–649.
[15] L. Wang, B. Jiang, Z. Tu, A. Hussain, J. Tang, Robust pixelwise saliency
detection via progressive graph rankings, Neurocomputing 329 (2019) 433–
446.
[16] M. M. Najafabadi, F. Villanustre, T. M. Khoshgoftaar, N. Seliya, R. Wald,
E. Muharemagic, Deep learning applications and challenges in big data
analytics, Journal of Big Data 2 (1) (2015) 1.
[17] M. Tavallaee, E. Bagheri, W. Lu, A. A. Ghorbani, A detailed analysis
of the kdd cup 99 data set, in: Computational Intelligence for Security
and Defense Applications, 2009. CISDA 2009. IEEE Symposium on, IEEE,
2009, pp. 1–6.
[18] B. Ingre, A. Yadav, Performance analysis of nsl-kdd dataset using ann, in:
Signal Processing And Communication Engineering Systems (SPACES),
2015 International Conference on, IEEE, 2015, pp. 92–96.
[19] L. M. Ibrahim, D. T. Basheer, M. S. Mahmod, A comparison study for
intrusion database (kdd99, nsl-kdd) based on self organization map (som)
artificial neural network, Journal of Engineering Science and Technology
8 (1) (2013) 107–119.
[20] H. Mohamed, H. Hefny, A. Alsawy, Intrusion detection system using ma-
chine learning approaches, Egyptian Computer Science Journal 42 (3).
[21] Y. Gao, Y. Liu, Y. Jin, J. Chen, H. Wu, A novel semi-supervised learning
approach for network intrusion detection on cloud-based robotic system,
IEEE Access.
[22] K. Alrawashdeh, C. Purdy, Toward an online anomaly intrusion detection
system based on deep learning, in: Machine Learning and Applications
(ICMLA), 2016 15th IEEE International Conference on, IEEE, 2016, pp.
195–200.
[23] T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, M. Ghogho, Deep
learning approach for network intrusion detection in software defined net-
working, in: Wireless Networks and Mobile Communications (WINCOM),
2016 International Conference on, IEEE, 2016, pp. 258–263.
[24] J. Kim, N. Shin, S. Y. Jo, S. H. Kim, Method of intrusion detection using
deep neural network, in: Big Data and Smart Computing (BigComp), 2017
IEEE International Conference on, IEEE, 2017, pp. 313–316.

21
[25] B. Yan, G. Han, Effective feature extraction via stacked sparse autoencoder
to improve intrusion detection system, IEEE Access 6 (2018) 41238–41248.
[26] C. Xu, J. Shen, X. Du, F. Zhang, An intrusion detection system using
a deep neural network with gated recurrent units, IEEE Access 6 (2018)
48697–48707.
[27] Y. Imamverdiyev, F. Abdullayeva, Deep learning method for denial of ser-
vice attack detection based on restricted boltzmann machine, Big Data
6 (2) (2018) 159–169.
[28] A. Javaid, Q. Niyaz, W. Sun, M. Alam, A deep learning approach for
network intrusion detection system, in: Proceedings of the 9th EAI In-
ternational Conference on Bio-inspired Information and Communications
Technologies (formerly BIONETICS), ICST (Institute for Computer Sci-
ences, Social-Informatics and Telecommunications Engineering), 2016, pp.
21–26.
[29] C. Yin, Y. Zhu, J. Fei, X. He, A deep learning approach for intrusion
detection using recurrent neural networks, IEEE Access 5 (2017) 21954–
21961.
[30] N. Shone, T. N. Ngoc, V. D. Phai, Q. Shi, A deep learning approach to
network intrusion detection, IEEE Transactions on Emerging Topics in
Computational Intelligence 2 (1) (2018) 41–50.
[31] M. Tavallaee, E. Bagheri, W. Lu, A. A. Ghorbani, Nsl-kdd dataset,
Available on http://www. unb. ca/research/iscx/dataset/iscx-NSL-KDD-
dataset. html),[Accessed on 28 Feb. 2016].
[32] G. E. Hinton, R. R. Salakhutdinov, Reducing the dimensionality of data
with neural networks, science 313 (5786) (2006) 504–507.
[33] J. Shore, R. Johnson, Axiomatic derivation of the principle of maximum
entropy and the principle of minimum cross-entropy, IEEE Transactions on
information theory 26 (1) (1980) 26–37.
[34] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural computa-
tion 9 (8) (1997) 1735–1780.
[35] B. Yegnanarayana, Artificial neural networks, PHI Learning Pvt. Ltd.,
2009.
[36] V. N. Vapnik, An overview of statistical learning theory, IEEE transactions
on neural networks 10 (5) (1999) 988–999.
[37] I. Steinwart, A. Christmann, Support vector machines, Springer Science &
Business Media, 2008.
[38] A. J. Izenman, Linear discriminant analysis, in: Modern multivariate sta-
tistical techniques, Springer, 2013, pp. 237–280.

22
[39] A. P. Bradley, The use of the area under the roc curve in the evaluation of
machine learning algorithms, Pattern recognition 30 (7) (1997) 1145–1159.
[40] H. Huang, R. S. Khalid, W. Liu, H. Yu, Work-in-progress: a fast online
sequential learning accelerator for iot network intrusion detection, in: Hard-
ware/Software Codesign and System Synthesis (CODES+ ISSS), 2017 In-
ternational Conference on, IEEE, 2017, pp. 1–2.
[41] A. Adeel, H. Larijani, A. Ahmadinia, Random neural network based novel
decision making framework for optimized and autonomous power control
in lte uplink system, Physical Communication 19 (2016) 106–117.

[42] X. Yang, K. Huang, R. Zhang, J. Y. Goulermas, A. Hussain, A new two-


layer mixture of factor analyzers with joint factor loading model for the
classification of small dataset problems, Neurocomputing 312 (2018) 352–
363.

23
Cosimo Ieracitano received the M.Eng. (summa cum laude) and Ph.D. de-
grees (with additional label of Doctor Europaeus) from the University Mediter-
ranea of Reggio Calabria (UNIRC), Italy, in 2013 and 2019, respectively. He
is currently a Research Fellow at the Neurolab group of the DICEAM Depart-
ment of the same University (UNIRC). He was a Visiting Master Student at
ETH Zurich and a Visiting PhD Student at the University of Stirling in 2013
and 2018, respectively. He is author/co-author of publications in peerreviewed
national/international journals and conference contributions. He is Local Ar-
rangements Chair for IEEE WCCI 2020. His main research interests include:
information theory, machine learning, deep learning techniques and biomedical
signal processing, in particular EEG signals of subjects affected by neuropatholo-
gies.

24
Ahsan Adeel holds B. Eng. (Hons), MSc (EEE), and PhD (Cognitive Com-
puting) degrees. Following an EPSRC/MRC prestigious fellowship at the Uni-
versity of Stirling (2016-18), he is currently a Lecturer (Assistant Professor) in
Computing Science at the University of Wolverhampton, UK, where he is lead-
ing the Conscious Multisensory Integration (CMI) Lab. He is a Visiting Fellow
at MIT Synthetic Intelligence Lab and Computational Neuroscience Lab (Uni-
versity of Oxford). His ongoing multidisciplinary research aims to explore and
exploit the power of advanced AI to design unorthodox brain-inspired cognitive
computing architectures by integrating suitable deep machine learning, reason-
ing, and optimization algorithms. His focused approaches include biophysical
and hardware-efficient neural models, explainable artificial intelligence, opti-
mized resource management, multimodal fusion, context-aware decision-making,
low power 5G IoT devices, and neuromorphic chips.

Francesco C. Morabito (M’89 - SM’00) was the Dean with the Faculty

25
of Engineering and Deputy Rector with the University Mediterranea of Reg-
gio Calabria, Reggio Calabria, Italy, where he is currently a Full Professor of
Electrical Engineering. He is also serving as the Vice-Rector for International
and Institutional Relations. He has authored or co-authored over 400 papers
in international journals/conference proceedings in various fields of engineering
(radar data processing, nuclear fusion, biomedical signal processing, nondestruc-
tive testing and evaluation, machine learning, and computational intelligence).
He has co-authored 15 books and holds three international patents. Prof. Mora-
bito is a Foreign Member of the Royal Academy of Doctors, Spain, in 2004, and
a member of the Institute of Spain, Barcelona Economic Network, in 2017. He
served as the Governor of the International Neural Network Society for 12 years
and as the President of the Italian Network Society from 2008 to 2014. He is
a member on the editorial boards of various international journals, including
the International Journal of Neural Systems, Neural Networks, International
Journal of Information Acquisition, and Renewable Energy.

Amir Hussain obtained his B.Eng. (with the highest 1st Class Honors)
and Ph.D. (in novel neural network architectures and algorithms) from the Uni-
versity of Strathclyde in Glasgow, Scotland, UK, in 1992 and 1997 respectively.
Following postdocoral and academic positions at the University of West of Scot-
land (1996-98), University of Dundee (1998-2000), and University of Stirling
(2000-2018) respectively, he joined Edinburgh Napier University, in Scotland,
UK, in 2018, as Professor of Computing Science, and founding Director of the
Cognitive Big Data and Cybersecurity (CogBiD) Research Laboratory. His re-
search interests are cross-disciplinary and industry focussed, and include secure
and context-aware 5G-IoT driven AI, and multi-modal cognitive and sentic com-
puting techniques and applications. He has published more than 400 papers,
including over a dozen books and around 150 journal papers. He has led ma-
jor national, European and international projects and supervised more than 30
PhD students, He is founding Editor-in-Chief of two leading journals: Cogni-
tive Computation (Springer Nature), and BMC Big Data Analytics (BioMed
Central); and Chief-Editor of the Springer Book Series on Socio-Affective Com-
puting, and Cognitive Computation Trends. He has been appointed invited
Associate Editor of several prestigeous journals, including the IEEE Transac-
tions on Neural Networks and Learning Systems, the IEEE Transactions on
Emerging Topics in Computational Intelligence, and (Elsevier) Information Fu-
sion. He is Vice-Chair of the Emergent Technologies Technical Committee of

26
the IEEE Computational Intelligence Society (CIS), and Chapter Chair of the
IEEE UK and RI Industry Applications Society.

27
Declaration of interests

The authors declare that they have no known competing financial interests or
personal relationships that could have appeared to influence the work reported
in this paper.

28

You might also like