0% found this document useful (0 votes)
64 views

Intrusion Detection in Software Defined Network Using Machine Learning

Contact us for project abstract, enquiry, explanation, code, execution, documentation. Phone/Whatsap : 9573388833 Email : [email protected] Website : https://dcs.datapro.in/contact-us-2 Tags: btech, mtech, final year project, datapro, machine learning, cyber security, cloud computing, blockchain,

Uploaded by

dataprodcs
Copyright
© © All Rights Reserved
0% found this document useful (0 votes)
64 views

Intrusion Detection in Software Defined Network Using Machine Learning

Contact us for project abstract, enquiry, explanation, code, execution, documentation. Phone/Whatsap : 9573388833 Email : [email protected] Website : https://dcs.datapro.in/contact-us-2 Tags: btech, mtech, final year project, datapro, machine learning, cyber security, cloud computing, blockchain,

Uploaded by

dataprodcs
Copyright
© © All Rights Reserved
You are on page 1/ 11

ABSTRACT

Intrusion detection systems (IDSs) are currently drawing a great amount of interest as a
key part of system defense. IDSs collect network traffic information from some point
on the network or computer system and then use this information to secure the network.
To distinguish the activities of the network traffic that the intrusion and normal is very
difficult and to need much time consuming. An analyst must review all the data that
large and wide to find the sequence of intrusion on the network connection. Therefore,
it needs a way that can detect network intrusion to reflect the current network traffics.
In this study, a novel method to find intrusion characteristic for IDS using genetic
algorithm machine learning of data mining technique was proposed. Method used to
generate of rules is classification by Genetic algorithm of decision tree. These rules can
determine of intrusion characteristics then to implement in the genetic algorithm as
prevention.so that besides detecting the existence of intrusion also can execute by doing
deny of intrusion as prevention.

v
TABLE OF CONTENT
Chapter Title Page No.
No.
Abstract v

List of Figures viii

1 Chapter 1

1.1 Introduction 1

1.2 Existing System 1

1.3 Problem Statement 1

2 Chapter 2

2.1 Designing a Network Intrusion Detection System Based on Machine 2


Learning for Software Defined Networks

2.2 A Deep Learning Approach for Network Intrusion Detection System 9

2.3 Intrusion Preventing System using Intrusion 14


Detection System Decision Tree Data Mining
2.4 A Deep Learning Approach to Network Intrusion 17
Detection

3 Chapter 25

3.1 Existing System 25

3.2 Proposed System 25

3.3 Problem Statement 26

3.4 Block and Flow Diagram 26

3.5 Decision Tree 27


3.6 Genetic Algorithm 31

4 Chapter 33

4.1 Machine Learning 33

4.2 Classification 35

4.3 Machine Learning Work Flow 40

vi
5 Chapter 42

5.1 Results 42

5.2 Code 46
Conclusion 63

References 63

Publication 64

vii
List Of Figures
Chapter Title Page no.
no.
3.4 Block diagram and Flow Diagram
3.6 Genetic Algorithm
4.2 Machine Learning Classification
4.3 Machine Learning work Flow
4.3 Data collection

viii
Chapter 1
INTRODUCTION

Approaches for intrusion detection can be broadly divided into two types: misuse detection and
anomaly detection. In misuse detection system, all known types of attacks (intrusions) can be
detected by looking into the predefined intrusion patterns in system audit traffic. In case of
anomaly detection, the system first learns a normal activity profile and then flags all system events
that do not match with the already established profile. The main advantage of the misuse detection
is its capability for high detection rate with a difficulty in finding the new or unforeseen attacks.
The advantage of anomaly detection lies in the ability to identify the novel (or unforeseen) attacks
at the expense of high false positive rate. Network monitoring-based machine learning techniques
have been involved in diverse fields. Using bi-directional long-short-term-memory neural
networks, a social media network monitoring system is proposed for analyzing and detecting
traffic accidents.
The proposed method retrieves traffic-related information from social media (Facebook and
Twitter) using query-based crawling: this process collects sentences related to any traffic events,
such as jams, road closures, etc. Subsequently, several pre-processing techniques are carried out,
such as steaming, tokenization, POS tagging and segmentation, in order to transform the retrieved
data into structured form. Thereafter, the data are automatically labeled as ‘traffic‘ or ‘non-traffic‘,
using a latent Dirichlet allocation (LDA) algorithm. Traffic- labeled data are analyzed into three
types; positive, negative, and neutral. The output from this stage is a sentence labeled according
to whether it is traffic or non-traffic, and with the polarity of that traffic sentence (positive,
negative or neutral). Then, using the bag-of-words (BoW) technique, each sentence is
transformed into a one-hot encoding representation in order to feed it to the Bi-directional LSTM
neural network (Bi-LSTM). After the learning process, the neural networks perform multi-class
classification using the softmax layer in order to classify the sentence in terms of location, traffic
event and polarity types. The proposed method compares different classical machine learning and
advanced deep learning approaches in terms of accuracy, F-score and other criteria.

EXISTING SYSTEM:
Today network has become an essential part of public infrastructures with the inception of public
and private cloud computing. The traditional networking approach has become too complex. This
complexity has resulted in a barrier for creating new and innovative services within a single data
center, difficulties in interconnecting data centers, interconnection within enterprises, and bigger
barrier in the continued growth of the Internet in general.

PROBLEM STATEMENT:

 To distinguish the activities of the network traffic that the intrusion and normal is very
difficult and to need much time consuming.

9
 An analyst must review all the data that large and wide to find the sequence of intrusion on
the network connection.
 It needs a way that can detect network intrusion to reflect the current network traffics.
 Combination of IDS and firewall so-called the IPS, so that besides detecting the existence of
intrusion also can execute by doing deny of intrusion as prevention.

Chapter 2

Designing a Network Intrusion Detection System Based on Machine


Learning for Software Defined Networks
Abstract:
Software-defined Networking (SDN) has recently developed and been put forward as a promising
and encouraging solution for future internet architecture. Managed, the centralized and controlled
network has become more flflexible and visible using SDN. On the other hand, these advantages
bring us a more vulnerable environment and dangerous threats, causing network breakdowns,
systems paralysis, online banking frauds and robberies. These issues have a significantly
destructive impact on organizations, companies or even economies. Accuracy, high performance
and real-time systems are essential to achieve this goal successfully. Extending intelligent
machine learning algorithms in a network intrusion detection system (NIDS) through a software-
defined network (SDN) has attracted considerable attention in the last decade. Big data
availability, the diversity of data analysis techniques, and the massive improvement in the
machine learning algorithms enable the building of an effective, reliable and dependable system
for detecting different types of attacks that frequently target networks. This study demonstrates
the use of machine learning algorithms for traffic monitoring to detect malicious behavior in the
network as part of NIDS in the SDN controller. Different classical and advanced tree-based
machine learning techniques, Decision Tree, Random Forest and XGBoost are chosen to
demonstrate attack detection. The NSL-KDD dataset is used for training and testing the proposed
methods; it is considered a benchmarking dataset for several state-of-the-art approaches in NIDS.
Several advanced preprocessing techniques are performed on the dataset in order to extract the
best form of the data, which produces outstanding results compared to other systems. Using just
five out of 41 features of NSL-KDD, a multi-class classification task is conducted by detecting
whether there is an attack and classifying the type of attack (DDoS, PROBE, R2L, and U2R),
accomplishing an accuracy of 95.95%.

Introduction
A network intrusion detection system is a process for discovering the existence of malicious or
unwanted packets in the network. This process is done using real-time traffic monitoring to find
out if any unusual behavior is present in the network or not. Big data, powerful computation
facilities, and the expansion of the network size increase the demand for the required tasks that
should be carried out simultaneously in real-time. Therefore, NIDS should be careful, accurate,

10
and precise in monitoring, which has not been the case in the traditional methods. On the other
hand, the rapid increase in the accuracy of machine learning algorithms is highly impressive. Its
introduction relies on the increasing demand for improved performance on different types of
network. However, software defined network (SDN) implementation of the network-based
intrusion detection system (NIDS) has opened a frontier for its deployment, considering the
increasing scope and typology of security risks of modern networks. The rapid growth in the
volume of network data and connected devices carries inherent security risks. The adoption of
technologies such as the Internet of Things (IoT), artificial intelligence (AI), and quantum
computing, has increased the threat level, making network security challenging and necessitating
a new paradigm in its implementation. Various attacks have overwhelmed previous approaches
(classified into signature-based intrusion detection systems and anomaly-based intrusion
detection systems, increasing the need for advanced, adaptable and resilient security
implementation. For this reason, the traditional network design platform is being transformed into
the evolving SDN implementation Monitoring data and analyzing it over time are essential to the
process of predicting future events, such as risks, attacks and diseases. The more details are
formed, discovered and documented through analyzing very large-scale data, the more saved
resources, as well as the working environment, will remain normal without any variations. Big
data analytics (BDA) research in the supply chain becomes the secret of a protector for managing
and preventing risks. BDA for humanitarian supply chains can aid the donors in their decision of
what is appropriate in situations such as disasters, where it can improve the response and
minimize human suffering and deaths. BDA and data monitoring using machine learning can help
in identifying and understanding the interrelationships between the reasons, difficulties, obstacles
and barriers that guide organizations in taking the most efficient and accurate decisions in risk
management processes. This could impact entire organizations and countries, producing a hugely
significant improvement in the process. Network monitoring-based machine learning techniques
have been involved in diverse fields. Using bi-directional long-short-term-memory neural
networks, a social media network monitoring system is proposed for analyzing and detecting
traffic accidents.The proposed method retrieves traffic-related information from social media
(Facebook and Twitter) using query-based crawling: this process collects sentences related to any
traffic events, such as jams, road closures, etc. Subsequently, several pre-processing techniques
are carried out, such as steaming, tokenization, POS tagging and segmentation, in order to
transform the retrieved data into structured form. Thereafter, the data are automatically labeled
as ‘traffic‘ or ‘non-traffic‘, using a latent Dirichlet allocation (LDA) algorithm.Traffic- labeled
data are analyzed into three types; positive, negative, and neutral. The output from this stage is a
sentence labeled according to whether it is traffic or non-traffic,and with the polarity of that traffic
sentence (positive, negative or neutral). Then, using the bag-of-words (BoW) technique, each
sentence is transformed into a one-hot encoding representation in order to feed it to the Bi-
directional LSTM neural network (Bi-LSTM). After the learning process, the neural networks
perform multi-class classification using the softmax layer in order to classify the sentence in terms
of location, traffic event and polarity types. The proposed method compares different classical
machine learning and advanced deep learning approaches in terms of accuracy, F-score and other
criteria. Many initiatives and workshops have been conducted in order to improve and develop
the healthcare systems using machine learning, such as [12]. In these workshops several proposed
11
machine algorithms have been used, such as K Nearest-Neighbors, logistic regression, K-means
clustering, Random Forest (RF) etc, together with deep learning algorithms such as CNN, RNN,
fully connected layer and auto-encoder. These varieties of techniques allow the researchers to
deal with several data types, such as medical imaging,history, medical notes, video data, etc.
Therefore, different topics and applications are introduced, with significant performance results
such as causal inference, in investigations of Covid-19, disease prediction, such as disorders and
heart diseases. Using intelligent ensemble deep learning methods, healthcare monitoring is
carried out for prediction of heart diseases. Real-time health status monitoring can prevent and
predict any heart attacks before occurrence. For disease prediction, the proposed ensemble deep
learning approach achieved a brilliant accuracy performance score of 98.5%. The proposed model
takes two types of data that are transferred and saved on an online cloud database. The first is the
data transferred from the sensors; these sensors have been placed in different places on the body
in order to extract more than 10 different types of medical data. The second type is the daily
electronic medical records from doctors, which includes various types of data, such as smoking
history, family diseases, etc. The features are fused using the feature fusion Framingham Risk
factors technique, which executes two tasks at a time, fusing the data together, and then extracting
a fused and informative feature from this data. Then different pre-processing techniques are used
to transform the data into a structured and well-prepared form, such as normalization, missing
values filtering and feature weighting. Subsequently, an ensemble deep learning algorithm starts
which learns from the data in order to predict whether a heart disease will occur or the threat is
absent. IDS refers to a mechanism capable of identifying or detecting intrusive activities. In a
broader view, this encompasses all the processes used in the discovery of unauthorized uses of
network devices or computers. This is achieved through software designed specifically to detect
unusual or abnormal activities. IDS can be classified according to several surveys and sources in
the literature into four types (HIDS, NIDS, WIDS, NBA). NIDS is an inline or passive-based
intrusion detection technique. The scope of its detection targets network and host levels. The only
architecture that fits and works with NIDS is the managed network. The advantage of using NIDS
is that it costs less and is quicker in response, since there is no need to maintain sensor
programming at the host level. The performance of monitoring the traffic is close to real-time;
NIDS can detect attacks as they occur. However, it has the following limited features. It does not
indicate if such attacks are successful or not: it has restricted visibility inside the host machine.
There is also no effective way to analyze encrypted network traffic to detect the type of attack.
Moreover, NIDS may have difficulty capturing all packets in a large or busy network. Thus, it
may fail to recognize an attack launched during a period of high traffic. SDN provides a novel
means of network implementation, stimulating the development of a new type of network security
application. It adopts the concept of programmable networks through the deployment of logically
centralized management. The network deployment and configuration are virtualized to simplify
complex processes, such as orchestration, network optimization, and traffic engineering. It creates
a scalable architecture that allows sufficient and reliable services based on certain types of traffic.
The global view approach to a network enhances flow-level control of the underlying layers.
Implementing NIDS over SDN becomes a major effective security defense mechanism for
detecting network attacks from the network entry point. NIDS has been implemented and
investigated for decades to achieve optimal efficiency. It represents an application or device for
12
monitoring network traffic for suspicious or malicious activity with policy violations. Such
activities include malware attacks, untrustworthy users, security breaches, and DDoS. NIDS
focuses on identifying anomalous network traffic or behavior; its efficiency means that network
anomaly is adequately implemented as part of the security implementation. Since it is nearly
impossible to prevent threats and attacks, NIDS will ensure early detection and mitigation.
However, the advancement in NIDS has not instilled sufficient confidence among practitioners,
since most solutions still use less capable, signature-based techniques. This study aims to increase
the focus on several points:
 choosing the right algorithm for the right tasks depends on the data types, size and
network behavior and needs.
 Implementing the optimized development process by preparing and selecting the
benchmark dataset in order to build a promising system in NIDS.
 Analyzing the data, finding, shaping, and engineering the important features, using
several preprocessing techniques by stacking them together with an intelligent order to
find the best accuracy with the lowest amount of data representation and size.
 Proposing an integration and complete development process using those algorithms and
techniques from the selection of dataset to the evaluation of the algorithms using a
different metric. Which can be extended to other NIDS applications.

This study enhances the implementation of NIDS by deploying different machine learning
algorithms over SDN. Tree-based machine learning algorithms (XGBoost, randomforest (RF),
and decision tree (DT)) were implemented to enhance the monitoring and accuracy performance
of NIDS. The proposed method will be trained on network traffic packet data, collected from
large-scale resources and servers called NSL-KDD dataset to perform two tasks at a time by
detecting whether there is an attack or not and classifying the type of attack. This study enhances
the implementation of NIDS by deploying machine learning over SDN. Tree-based machine
learning algorithms (XGBoost, random forest (RF), and decision tree (DT)) are proposed to
enhance NIDS. The proposed method will be trained on network traffic packet data, collected
from large-scale resources and servers, called the NSL-KDD dataset to perform two tasks at a
time by detecting whether there is an attack or not and classifying the type of attack.

Background and Related Work:


Integrating machine learning algorithms into SDN has attracted significant attention.
In, a solution was proposed that solved the issues in KDD Cup 99 by performing an extensive
experimental study, using the NSL-KDD dataset to achieve the best accuracy in intrusion
detection. The experimental study was conducted on five popular and efficient machine learning
algorithms (RF, J48, SVM, CART, and Naïve Bayes). The correlation feature selection algorithm
was used to reduce the complexity of features, resulting in 13 features only in the NSL-KDD
dataset. This study tests the NSL-KDD dataset‘s performance for real-world anomaly detection
in network behavior. Five classic machine learning models RF, J48, SVM, CART, and Naïve
Bayes were trained on all 41 features against the five normal types of attacks, DOS, probe, U2R,
and R2L to achieve average accuracies of 97.7%, 83%, 94%, 85%, and 70% for each algorithm,

13
respectively. The same models were trained again using the reduced 13 features to achieve
average accuracies of 98%, 85%, 95%, 86%, and 73% for each model. In, a deep neural network
model was proposed to find and detect intrusions in the SDN. The NSL-KDD dataset was used
to train and test the model. The neural network was constructed with five primary layers, one
input layer with six inputs, three hidden layers with (12, 6, 3) neurons, and one output layer with
2D dimensions. The proposed method was trained on six features chosen from 41 features in the
NSL-KDD dataset, which are basic and traffic features that can easily be obtained from the SDN
environment. The proposed method calculates the accuracy, precision and recall, achieving an
F1-score of 0.75. A second evaluation was conducted on seven classic machine learning models
(RF, NB, NB Tree, J48, DT, MLP, and SVM) proposed in and the model achieved sixth place
out of eight. The same author extended the approach using a gated recurrent unit neural network
(GRU-RNN) for SDN anomaly detection, achieving accuracy up to 89%. In addition, the Min-
Max normalization technique is used for feature scaling to improve and boost the learning process.
The SVM classifier, integrated with the principal component analysis (PCA) algorithm, was used
for an intrusion detection application. The NSL-KDD dataset is used in this approach to train and
optimize the model for detecting abnormal patterns. A Min-Max normalization technique was
proposed to solve the diversity data scale ranges with the lowest misclassification errors. The
PCA algorithm is selected as a statistical technique to reduce the NSL-KDD dataset‘s complexity,
reducing the number of trainable parameters that needed to be learned. The nonlinear radial basis
function kernel was chosen for SVM optimization. Detection rate (DR), false alarm rate (FAR),
and correlation coefficient metrics were chosen to evaluate the proposed model, with an overall
average accuracy of 95% using 31 features in the dataset. In [32], an extreme gradient-boosting
(XGBoost) classifier was used to distinguish between two attacks, i.e., normal and DoS. The
detection method was analyzed and conducted over POX SDN, as a controller, which is an SDN
open-source platform for prototyping and developing a technique based on SDN. Mininet was
used to emulate the network topology to simulate real-time SDN-based cloud detection. Logistic
regression was selected as a learning algorithm, with a regularization term penalty to prevent
overfitting. The XGBoost term was added and combined with the logistic regression algorithm
to boost the computations by constructing structure trees. The dataset used in this approach was
KDD Cup 1999, while 400 K samples were selected for constructing the training set. Two types
of normalization techniques were used; one with a logarithmic-based technique and one with a
Min-Max-based technique. The average overall accuracy for XGBoost, compared to RF and SVM,
was 98%, 96%, 97% respectively. Based on DDoS attack characteristics, a detection system was
simulated with the Mininet and flfloodlight platform using the SVM algorithm [5]. The proposed
method categorizes the characteristics into six tuples, which are calculated from the packet
network. These characteristics are the speed of the source IP (SSIP), the speed of the source port,
the standard deviation of flflow packets, the deviation of flflow bytes (SDFB), the speed of flow
entries, and the ratio of pair-flflow. Based on the calculated statistics from the SVM classifier‘s
six characteristics, the current network state is normal or attack. Attack flow (AF), DR, and FAR
were chosen to achieve an average accuracy of 95%. In TSDL a model with two stages of deep
neural networks was designed and proposed for NIDS, using a stacked auto-encoder, integrated
with softmax in the output layer as a classifier. TSDL was designed and implemented for Multi-
class classification of attack detection. Down-sampling and other preprocessing techniques were
14
performed over different datasets in order to improve the detection rate, as well as the monitoring
efficiency. The detection accuracy for UNSW-NB15 was 89.134%. Different models of neural
networks, such as variational auto-encoder, seq2seq structures using Long-ShortTerm-Memory
(LSTM) and fully connected networks were proposed in [34] for NIDS. The proposed approach
was designed and implemented to differentiate between normal and attack packets in the network,
using several datasets, such as NSL-KDD, UNSW NB15, KYOTO-HONEYPOT, and
MAWILAB. A variety of preprocessing techniques have been used, such as one-hot-encoding,
normalization, etc., for data preparation, feature manipulation and selection and smooth training
in neural networks. Those factors are designed mainly, but not only, to enable the neural networks
to learn complex features from different scopes of a single packet. Using 4 hidden layers, a deep
neural network model [35] was illustrated and implemented on KDD cup99 for monitoring
intrusion attacks. Feature scaling and encoding were used for data preprocessing and lower data
usage. More than 50 features were used to perform this task on different datasets. Therefore,
complex hardware GPUs were used in order to handle this huge number of features with lower
training time. A supervised [36] adversarial auto-encoder neural network was proposed for NIDS.
It combined GANS and a variational auto-encoder. GANS consists of two different neural
networks competing with each other, known as the generator and the discriminator. The result of
the competition is to minimize the objective function as much as possible, using the Jensen-
Shannon minimization algorithm. The generator tries to generate fake data packets, while the
discriminator determined whether this data is real or fake; in other words, it checks if that packet
is an attack or normal. In addition, the proposed method integrates the regularization penalty with
the model structure for overfitting control behavior. The results were reasonable in the detection
rate of U2RL and R2L but lower in others. Multi-channel deep learning of features for NIDS was
presented in [37], using AE involving CNN, two fully connected layers and the output to the
softmax classifier. The evaluation is done over three different datasets; KDD cup99, UNSW-
NB15 and CICIDS, with an average accuracy of 94%. The proposed model provides effective
results; however, the structure and the characteristics of the attack were not highlighted clearly.
The proposed method enhances the implementation of NIDS by deploying machine learning over
SDN. It introduces a machine learning algorithm for network monitoring within the NIDS
implementation on the central controller of the SDN. In this paper, enhanced tree-based machine
learning algorithms are proposed for anomaly detection. Using only five features, a multi-class
classification task is conducted by detecting whether there is an attack or not and classifying the
type of attack.
3. Proposed Method
In this section, we discuss and explain each component and its role in the NIDS architecture. As
shown in Figure 1, the SDN architecture can be divided into three main layers, as follows:

System Architecture Layers


NIDS component architecture is constructed in three main parts as follows:
• The infrastructure layer consists of two main parts: hardware and software components.
The hardware components are devices such as routers and switches. The software
components are those components that interface with the hardware, such as Open Flow
switches.
15

You might also like