0% found this document useful (0 votes)
7 views14 pages

Preview

This thesis investigates various Recurrent Neural Network (RNN) architectures, particularly Long Short-term Memory (LSTM) and its variants, for network intrusion detection. The study evaluates the learning time and prediction accuracy of these architectures using the DARPA/KDD Cup’99 dataset, finding that standard LSTM outperforms others with a 99.48% accuracy. Additionally, feature selection techniques like Random Forest and Principal Component Analysis were applied, with Random Forest yielding better results in feature significance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views14 pages

Preview

This thesis investigates various Recurrent Neural Network (RNN) architectures, particularly Long Short-term Memory (LSTM) and its variants, for network intrusion detection. The study evaluates the learning time and prediction accuracy of these architectures using the DARPA/KDD Cup’99 dataset, finding that standard LSTM outperforms others with a 99.48% accuracy. Additionally, feature selection techniques like Random Forest and Principal Component Analysis were applied, with Random Forest yielding better results in feature significance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Western University

Scholarship@Western

Electronic Thesis and Dissertation Repository

8-24-2018 9:30 AM

Recurrent Neural Network Architectures Toward Intrusion


Detection
Wafaa Anani, The University of Western Ontario

W
Supervisor: Jagath Samarabandu, The University of Western Ontario
A thesis submitted in partial fulfillment of the requirements for the Master of Engineering
IE
Science degree in Electrical and Computer Engineering
© Wafaa Anani 2018
EV
PR

Follow this and additional works at: https://ir.lib.uwo.ca/etd

Part of the Electrical and Computer Engineering Commons, and the Other Computer Engineering
Commons

Recommended Citation
Anani, Wafaa, "Recurrent Neural Network Architectures Toward Intrusion Detection" (2018). Electronic
Thesis and Dissertation Repository. 5625.
https://ir.lib.uwo.ca/etd/5625

This Dissertation/Thesis is brought to you for free and open access by Scholarship@Western. It has been accepted
for inclusion in Electronic Thesis and Dissertation Repository by an authorized administrator of
Scholarship@Western. For more information, please contact [email protected].
Abstract

Recurrent Neural Networks (RNN) show a remarkable result in sequence learning, partic-
ularly in architectures with gated unit structures such as Long Short-term Memory (LSTM). In
recent years, several permutations of LSTM architecture have been proposed mainly to over-
come the computational complexity of LSTM. In this dissertation, a novel study is presented
that will empirically investigate and evaluate LSTM architecture variants such as Gated Recur-
rent Unit (GRU), Bi-Directional LSTM, and Dynamic-RNN for LSTM and GRU specifically
on detecting network intrusions. The investigation is designed to identify the learning time re-
quired for each architecture algorithm and to measure the intrusion prediction accuracy. RNN
was evaluated on the DARPA/KDD Cup’99 intrusion detection dataset for each architecture.
Feature selection mechanisms were also implemented to help in identifying and removing non-

W
essential variables from data that do not affect the accuracy of the prediction models, in this
case Principal Component Analysis (PCA) and the RandomForest (RF) algorithm. The results
showed that RF captured more significant features over PCA when the accuracy for RF 97.86%
IE
for LSTM and 96.59% for GRU, were PCA 64.34% for LSTM and 67.97% for GRU. In terms
of RNN architectures, prediction accuracy of each variant exhibited improvement at specific
parameters, yet with a large dataset and a suitable time training, the standard vanilla LSTM
EV
tended to lead among all other RNN architectures which scored 99.48%. Although Dynamic
RNN’s offered better performance with accuracy, Dynamic-RNN GRU scored 99.34%, how-
ever they tended to take a longer time to be trained with high training cycles, Dynamic-RNN
PR

LSTM needs 25284.03 seconds at 1000 training cycle. GRU architecture had one variant in-
troduced to reduce LSTM complexity, which developed with fewer parameters resulting in a
faster-trained model compared to LSTM needs 1903.09 seconds when LSTM required 2354.93
seconds for the same training cycle. It also showed equivalent performance with respect to the
parameters such as hidden layers and time-step. BLSTM offered impressive training time as
190 seconds at 100 training cycle, though the accuracy was below that of the other RNN archi-
tectures which didn’t exceed 90%.

Keywords: Recurrent Neural Networks, Gated Recurrent Unit, Long Short-term Memory,
Skip-LSTM, Bi-Directional LSTM, Dynamic-RNN, Intrusion Detection, Deep Learning.

i
Acknowlegements

Foremost, I would like to express my sincere gratitude to my advisor Dr. Jagath Samarabandu
of the Electrical and Computer Engineering department at Western University, for the contin-
uous support of my MESc study and research, as well as his patience, motivation, enthusiasm,
and immense knowledge. The door to Dr. Samarabandu’s office was always open whenever I
was stuck on an issue or had a question to ask. He consistently directed me in the right path
whenever needed. His guidance has assisted me in writing of this thesis.

I would also like to sincerely thank my labmates, Gobi and Nadun, for their companionship
during this time and their feedback during our invaluable lab meetings.

I must express my gratitude to my parents for providing me with unfailing support and

W
continuous encouragement throughout my years of study. And finally to my sister Lina, who
experienced all the ups and downs that I faced in my research, and kept me motivated.
IE
EV
PR

ii
Contents

Certificate of Examination i

Abstract i

Acknowlegements ii

W
List of Figures v

List of Tables IE vi

Abbreviations vii
EV
1 Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
PR

1.4 Purpose, Scope, and Contribution . . . . . . . . . . . . . . . . . . . . . . . . . 3


1.5 Research Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Literature Review 8

3 Background 15
3.1 Intrusion Detection and Machine Learning . . . . . . . . . . . . . . . . . . . . 15
3.2 Recurrent Neural Network (RNN) . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 Long Short-term Memory (LSTM) . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Gated Recurrent Unit (GRU) . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.5 Bi-Directional LSTM (BLSTM) . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.6 Dynamic-RNN LSTM/GRU . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.7 Random Forest (RF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.8 Principal Component Analysis (PCA) . . . . . . . . . . . . . . . . . . . . . . 23

iii
3.9 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.9.1 Learning Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.9.2 Hidden Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.9.3 Hidden Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.9.4 Time-Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.10 Evaluation Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4 Experimental Results 26
4.1 Dataset Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3.1 RandomForest (RF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3.2 Principal Component Analysis (PCA) . . . . . . . . . . . . . . . . . . 29

W
5 Results, Analysis, and Discussion 32
5.1 Introduction . . . . . . . . . . . . . . . . IE . . . . . . . . . . . . . . . . . . . . 32
5.2 Phase I: Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.3 Phase II: RNN Architectures for IDS . . . . . . . . . . . . . . . . . . . . . . 33
5.3.1 Long Short-term Memory (LSTM) . . . . . . . . . . . . . . . . . . . . 35
EV
5.3.2 Gated Recurrent Unit (GRU) . . . . . . . . . . . . . . . . . . . . . . . 36
5.3.3 Bi-Directional LSTM (BLSTM) . . . . . . . . . . . . . . . . . . . . . 36
5.3.4 Dynamic-RNN LSTM/GRU . . . . . . . . . . . . . . . . . . . . . . . 37
5.3.5 Overall Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
PR

6 Conclusion and Future Work 48

Bibliography 50

A 56

Curriculum Vitae 59

iv
List of Figures

1.1 Research Methodology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1 LSTM “Memory Cell”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1 Simple RNN Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18


3.2 LSTM Cell Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 GRU Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

W
3.4 Bi-Directional LSTM Architecture. . . . . . . . . . . . . . . . . . . . . . . . 21

4.1 Feature Selection Based on RF. (y-axis) shows Feature Importances and (x-
IE
axis) shows Feature IDs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Features Selection Based on the PCA Classifier. . . . . . . . . . . . . . . . . . 31
EV
5.1 LSTM and GRU Accuracy Comparison between . . . . . . . . . . . . . . . . . 34
5.2 Learning Rate Cost for LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.3 LSTM Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.4 LSTM Training Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
PR

5.5 Learning Rate Cost for GRU . . . . . . . . . . . . . . . . . . . . . . . . . . . 40


5.6 GRU Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.7 GRU Training Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.8 Learning Rate Cost for DRNN LSTM . . . . . . . . . . . . . . . . . . . . . . 43
5.9 Learning Rate Cost for DRNN GRU . . . . . . . . . . . . . . . . . . . . . . . 44
5.10 DRNN LSTM and DRNN GRU Accuracy . . . . . . . . . . . . . . . . . . . . 45
5.11 RNN Architectures over all Training Time . . . . . . . . . . . . . . . . . . . . 46
5.12 Comparison of the Optimized LSTM Model Accuracy Rate with other LSTM
model proposed by other Literature Review . . . . . . . . . . . . . . . . . . . 47

v
List of Tables

4.1 KDD Cup 1999 Datasets (Number of Samples) . . . . . . . . . . . . . . . . . 27


4.2 Top 12 Selected Features Based on the RF classifier . . . . . . . . . . . . . . . 30
4.3 Top 12 Selected Features Based on the PCA classifier . . . . . . . . . . . . . . 31

5.1 LSTM and GRU Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33


5.2 Parameter Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

W
5.3 Vanilla LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.4 GRU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.5 BLSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IE . 37
5.6 DRNN Accuracy for Each Learning Rate . . . . . . . . . . . . . . . . . . . . . 39
5.7 DRNN LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.8 DRNN GRU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
EV
5.9 RNN Architecture Overall Comparison for Accuracy . . . . . . . . . . . . . . 45
5.10 RNN Architecture Overall Comparison for Training Time . . . . . . . . . . . 46
5.11 Comparison of the Optimized LSTM Model Accuracy Rate with other LSTM
PR

model proposed by other Literature Review . . . . . . . . . . . . . . . . . . . 47

A.1 All the 41 Features of KDD Cup’99 Dataset . . . . . . . . . . . . . . . . . . . 56

vi
List of Abbreviations

BLSTM Bi-Directional LSTM

DRNN Dynamic-RNN

GRU Gated Recurrent Unit

W
IDS Intrusion Detection Systems

LSTM Long Short-term Memory


IE
PCA Principle Component Analysis
EV
RF RandomForest

RNN Recurrent Neural Network


PR

vii
Chapter 1

Introduction

W
1.1 Overview
IE
EV

Intrusion detection is a key research area in network security. A common approach for in-
trusion detection is detecting anomalies in network traffic, however, network threats are evolv-
PR

ing at an unprecedented rate. The difference between the evolution of threats and the current
detection response time of a network leave the system vulnerable to attacks [1]. Over the years,
a number of machine learning techniques have been developed to detect network intrusions us-
ing packet prediction [2]. Recurrent Neural Network (RNN) is the most popular method of
performing classification and other analysis on sequences of data. A subset network of RNN
is Long Short-term Memory (LSTM), introduced by Hochreiter and Schmidhuber (1997) [3].
LSTM is a key algorithm in regards to the implementation of machine learning tasks that in-
volve sequential data. Successful deployment of LSTM has led the industry to heavily invest in
implementing the algorithm in a wider range of applications. These applications include voice
recognition [4], [5], handwriting recognition [6], machine translation and social media filter-
ing, thus making LSTM a natural candidate for Intrusion Detection Systems (IDS). Yet, this

1
2 Chapter 1. Introduction

algorithm incurs high computational costs when it is deployed on a large scale, in both time and
memory complexity [7], [8]. To overcome this challenge, several variations of the algorithm
have been proposed. These variations include Gated Recurrent Unit (GRU), Bi-Directional
long short-term memory (BLSTM), Dynamic-RNN for LSTM and GRU, and Skip-RNN. This
thesis presents a novel empirical study investigating and implementing the variants of LSTM
architectures for intrusion detection based on predicting packet sequences. The implementa-
tion of each architecture was evaluated in terms of training time, prediction accuracy (normal
or intrusion), the sensitivity of parameters, as well as several performance metrics including
precision, recall, and false alarm rate. Experiments were conducted on the full KDD Cup’99
- intrusion detection dataset [9], these algorithms were evaluated on the entire data set, rather

W
than on the most commonly used KDD 10% dataset used in the majority of intrusion detection
literature. IE
EV

1.2 Motivation
PR

The Long Short-term Memory (LSTM) is a subset network of the RNN, an architecture
that excels at storing sequential short-term memories and retrieving them many time-steps
later. For example, RNN has the capability to learn from previous time-steps of the input data.
The data at each time-step is processed and stored and given as input to the next time-step.
The algorithm at the next time step utilizes the previous data stored to process the information.
Such architecture, with a robust computation power would be suitable for security applications,
in particular dealing with streaming data such as network sequence packets. The security
domain is always researching to catch up with the evolution of intrusion. The new field of
RNN in intrusion detection is still in the initial stages of research and has an immense potential
for adaptation of these gated algorithms to learn insights much faster and provide intrusion
detection close to real-time. Though the neural network structures are complex, with the right
1.3. Problem Statement 3

set of parameters they can be tuned to obtain light-weight functionality. This served as the
motivation to explore gated RNNs and focus on the comparison between Long Short-term
Memory (LSTM) and other variant versions of it such as GRU, BLSTM, Dynamic-RNN for
LSTM/GRU and Skip-RNN.

1.3 Problem Statement

The main goal of this dissertation is to explore and analyze various RNN architectures, tune
each architecture with a different set of parameters such as hidden layers, time-steps, training

W
cycle and learning rate with the goal of identifying the best parameters to achieve a shorter time
in training each algorithm and high accuracy in predicting whether a network stream packet is
IE
an intrusion or not.
EV
I am looking to answer the following questions:

• What is the best architecture for the intrusion detection domain?


PR

• What is the set of parameters that helps to achieve high accuracy and less time in train-
ing?

• What is the impact of feature selection on each algorithm?

1.4 Purpose, Scope, and Contribution

The purpose of this research is to evaluate different RNN algorithms on an intrusion detec-
tion dataset. The best known algorithms were identified, such as LSTM, GRU, BLSTM, and
Dynamic-RNN LSTM/GRU. This research has an immense potential to open doors for improv-
ing the intrusion detection applications domain as it offers a better detection rate in catching
4 Chapter 1. Introduction

any attack before network security is compromised. The scope of this project is to find the
most suitable architecture that should fulfill the research purpose by evaluating and analyzing
the performance of the selected algorithms in terms of prediction accuracy and time required
for each algorithm to be trained on an intrusion detection dataset. This study didn’t measure
detection time which is the time elapsed between the initial breach of a network by an attacker
and the discovery of that breach.

The main contributions of this thesis are summarized below:

• Implemented, evaluated and compared the different RNN algorithms (LSTM, GRU,

W
BLSTM, and DRNN LSTM/GRU) in terms of training time and prediction accuracy.

• Identified the sensitivity of each algorithm with respect to its individual parameters, such
IE
as learning rate, hidden layers, and training cycles, then measured the prediction accu-
racy
EV
• Introduced for the first time DRNN LSTM/GRU to the intrusion detection domain.

• Calculated the performance metrics including precision, recall, and false alarm rate for
PR

each algorithm.

• Two algorithms for classification mechanism called RandomForest (RF) and Principal
Component Analysis (PCA) are presented for feature selection in the domain of intrusion
detection. A comparison between the two algorithms was conducted to find the most
suitable algorithm to represent the data with the best performance in terms of prediction
accuracy.

• Overall results were presented illustrating the best-case scenario obtained for each algo-
rithm to be employed in the intrusion detection domain.

• The proposed LSTM optimized model scored 99.43%. Where as 19,593 more attacks
out of 3,925,650 have been correctly detected when compared with LSTM models.
1.5. Research Methodology 5

1.5 Research Methodology

Domain issues and challenges were identified with regards to intrusion detection through
literature reviews.The most often used algorithms and experiments conducted were identified
as well as the accuracy rate for each. As a result, it became evident that RNN architectures
are new techniques in the intrusion detection domain. Proof of the concept of using LSTM
and GRU algorithms in the field of intrusion detection is scarce due to lack of intensive exper-
iments. None of the literature consulted demonstrated the best architecture or were compared
among the algorithms in terms of their parameters to achieve the best performance with high
accuracy. There are some challenges facing IDS which make it difficult to achieve that goal.

W
Classification of data and labeling of unlabeled data seems to be a challenging task, as the
current high volume of network traffic increases the number of attacks.
IE
A module for each selected architecture was therefore developed. Feature selection was
EV
then implemented to ensure the best representation of all the data and better represent the un-
derlying problem to each prediction model, resulting in improved model accuracy on unseen
data. In this research, two phases were demonstrated for the experiment. One can be de-
PR

scribed as the features selection phase, using the two different selection mechanisms, Principal
Component Analysis (PCA) and RandomForest (RF). RF learns from inputs and improves per-
formance over time. With regards to intrusion detection, the aim is for the algorithm to learn
over time which are the best features from the network features. PCA selects a new feature
set to reduce redundancy of the features and improves performance [10], [11]. It had to be de-
cided which algorithm would offer better performance for the intrusion detection domain, and
to be implemented in IDS in real-time traffic. The second phase is to attempt to evaluate the
selected algorithms on a full KDD Cup’99 dataset which is a commonly used intrusion detec-
tion dataset. A baseline was created with initial values for each parameter, based on literature
review. Different values were used with each run to fine tune the parameters and to identify
the suitable ones that showed best prediction accuracy. Research methodology illustrated in
6 Chapter 1. Introduction

Figure 1.1.

W
IE
EV
PR

Figure 1.1: Research Methodology.

1.6 Thesis Organization

Chapter 2 provides a literature review related to security, in particular using a machine


learning mechanism for intrusion detection. The main focus is on RNNs architecture and how
far other researchers have advanced in the field. Chapter 3 demonstrates each algorithm archi-
tecture and its equations, along with all related topics including intrusion detection, parameter
definitions, and evaluation matrices used in this research. In Chapter 4, experimental results are

Reproduced with permission of copyright owner. Further reproduction prohibited without permission.

You might also like