0% found this document useful (0 votes)
79 views

Network-Based Intrusion Detection With Support Vector Machines

The document discusses using machine learning techniques like support vector machines to detect network intrusions. It summarizes several papers on using methods like SVM, decision trees, and deep learning on datasets like NSL-KDD and DARPA to classify network traffic as normal or an attack like denial of service, probe, user to root, or remote to local. The proposed approach involves preprocessing the NSL-KDD dataset, then using machine learning algorithms to build models that can detect cyber attacks and intrusions.

Uploaded by

Inayat Baig
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views

Network-Based Intrusion Detection With Support Vector Machines

The document discusses using machine learning techniques like support vector machines to detect network intrusions. It summarizes several papers on using methods like SVM, decision trees, and deep learning on datasets like NSL-KDD and DARPA to classify network traffic as normal or an attack like denial of service, probe, user to root, or remote to local. The proposed approach involves preprocessing the NSL-KDD dataset, then using machine learning algorithms to build models that can detect cyber attacks and intrusions.

Uploaded by

Inayat Baig
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Network-Based Intrusion Detection

with Support Vector Machines


BTech 6th SEM,Indian institute of Information Technology,ALLAHABAD
Group no 33
Nikhil kumar(IIT2018152),PRABAL TIKERIHA(IIT2018140),VISHAL(IIT2018153), INAYAT
BAIG(IIT2018165)

Summary/Literature review of all related papers

S.no Authors Name Paper’s title Confere Method/appro Application Dataset used Feature scope
nce/Jour ach used domain
nal
Paper
with
year

1. Shailendra Singh Cyber Attack IEEE Improved Companies KDDCUP2009 Network monitoring
and Sanjay Silakari Detection 2017 Support Vector that monitor dataset
System based Machine traffic.
on Improved
Support Vector
Machine

2. Iqbal H. Sarker , IntruDTree: A MDPI IntruDTree Companies publicly available in Network monitoring
Yoosef B. Abushark Machine 2020 that monitor Kaggle
, Fawaz Alsolami Learning traffic.
and Asif Irshad Based Cyber
Khan Security
Intrusion
Detection
Model

3 Mowei Wang, Yong Machine IEEE Machine Companies publicly available in Network monitoring
Cui, Xin Wang, Learning for 2017 Learning that monitor Kaggle
Shihan Xiao, and Networking: traffic.
Junchen Jiang Workflow,
Advances and
Opportunities
4 Machine MDPI Machine Companies DARPA1998 Attack Detection
Hongyu Liu * and Learning and 2019 learning,Deep that monitor
Bo Lang Deep Learning learning and
Methods for research
Intrusion attacks.
Detection
Systems:

5 Yuan Zhang, Anomaly IEEE SVM is It provides Recent realistic Attack Detection
Qinghai Yang, Based Network 2019 calculated with the Internet traffic
Konstantinos Intrusion two features by theoretical dataset
Kyriakopoulos, Detection packet count of basis for
Sangarapillai System Using control and using TCP
Lambotharan SVM data planes traffic to
traffic. detect
intrusions
and attacks
in the
Internet to
the Network
Companies..
Base Paper-
1.Cyber Attack Detection System based on Improved Support Vector
Machine .

https://pdfs.semanticscholar.org/a159/8655d3a6b94344eb4027705109e0738c
448f.pdf

2. A Highly Customizable Network Intrusion Dataset Creation


Framework(.https://www.researchgate.net/publication/325228601_INSecS-D
CS_A_Highly_Customizable_Network_Intrusion_Dataset_Creation_Framewor
k)

3.On generating network traffic datasets with synthetic attacks for


intrusion detection.(https://arxiv.org/abs/1905.00304)
Project Scope

The scope of this project is to update the existing WAN that connects all major sales offices
in the United States to corporate headquarters. The new WAN will be accessed by sales,
marketing, and training employees. It is beyond the scope of this project to update any
LANs that these employees use. It is also beyond the scope of this project to update the
networks in satellite and telecommuter offices.

The scope of the project might intentionally not cover some matters. For example, fixing
performance problems with a particular application might be intentionally beyond the scope
of the project. By stating upfront the assumptions you made about the scope of the project,
you can avoid any perception that your solution inadvertently fails to address certain
concerns.

We can easily implement any changes to the Network Design as we are using the latest
protocol like Border Gateway Protocol (BGP) in our network which is having attributes to
easily divert or control the flow of data and QOS which can be used to allocate bandwidth
to servers accordingly.
Problem Statement

The rapid increase in connectivity and accessibility of computer systems has resulted in
frequent chances for cyber-attacks.The concept of a cyber attack or a computer network
attack is rooted in this description. Techopedia describes a cyber attack as a “deliberate
exploitation of computer systems, technology-dependent enterprises and networks.” It’s not
just computer networks and computer information systems that are being attacked. Cyber
attacks are also infamous for attacking computer infrastructure and peoples’ personal
computers.Basically the cyber-attack detection is a classification problem, in which we
classify the normal pattern from the abnormal pattern (attack) of the system.
Project Scope

The scope of this project is to update the existing WAN that connects all major sales offices
in the United States to corporate headquarters. The new WAN will be accessed by sales,
marketing, and training employees. It is beyond the scope of this project to update any
LANs that these employees use. It is also beyond the scope of this project to update the
networks in satellite and telecommuter offices.

The scope of the project might intentionally not cover some matters. For example, fixing
performance problems with a particular application might be intentionally beyond the scope
of the project. By stating upfront the assumptions you made about the scope of the project,
you can avoid any perception that your solution inadvertently fails to address certain
concerns.

We can easily implement any changes to the Network Design as we are using the latest
protocol like Border Gateway Protocol (BGP) in our network which is having attributes to
easily divert or control the flow of data and QOS which can be used to allocate bandwidth
to servers accordingly.
Tentative proposed approach:
Dataset Description

NSL-KDD Data Set contains the records of the internet traffic seen by a simple

intrusion detection network and are the ghosts of the traffic encountered by a real

IDS and just the traces of its existence remains. The data set contains 43 features

per record, with 41 of the features referring to the traffic input itself and the last two

are labels (whether it is a normal or attack) and Score (the severity of the traffic input

itself).

Within the data set exists 4 different classes of attacks: Denial of Service (DoS),

Probe, User to Root(U2R), and Remote to Local (R2L). A brief description of each

attack can be seen below:

● DoS is an attack that tries to shut down traffic flow to and from the target

system. The IDS is flooded with an abnormal amount of traffic, which the

system can’t handle, and shuts down to protect itself. This prevents

normal traffic from visiting a network. An example of this could be an

online retailer getting flooded with online orders on a day with a big sale,
and because the network can’t handle all the requests, it will shut down,

preventing paying customers to purchase anything. This is the most

common attack in the data set.

● Probe or surveillance is an attack that tries to get information from a

network. The goal here is to act like a thief and steal important

information, whether it be personal information about clients or banking

information.

● U2R is an attack that starts off with a normal user account and tries to

gain access to the system or network, as a super-user (root). The attacker

attempts to exploit the vulnerabilities in a system to gain root

privileges/access.

● R2L is an attack that tries to gain local access to a remote machine. An

attacker does not have local access to the system/network, and tries to

“hack” their way into the network.


METHODOLOGY:-
Here we describe briefly about what would be the working setup. The steps
that we use in our methodology are performed sequentially in the following
manner:-

1.Data source:- We will use NSL-KDD dataset for training.

2.Preprocessing Data:-
● Numerical attributes are extracted and scaled to have to zero mean and
unit variance.Whenever we start with any dataset in machine learning,
we often assume that all the data features are equally important with
respect to the output and one feature should not dominate over other
features. That’s GENERALLY the reason we choose to bring all the
features to the same scale.
● Random Oversampling: Randomly duplicate examples in the minority
class.Random oversampling involves randomly selecting examples from
the minority class, with replacement, and adding them to the training
dataset. Random undersampling involves randomly selecting examples
from the majority class and deleting them from the training dataset.They
are referred to as “naive resampling” methods because they assume
nothing about the data and no heuristics are used. This makes them
simple to implement and fast to execute, which is desirable for very
large and complex datasets.
● We will apply feature reduction technique on the NSL-KDD Data Set
dataset with 41 features.Random Forest Classifier algorithm will be used for
selecting most discriminant features. Each record is located in the
n-dimensional space,with each dimension corresponding to a feature of the
record.Therefore, when training a tree, it is possible to compute how much
each feature decreases the impurity. The more a feature decreases the
impurity, the more important the feature is. In random forests, the impurity
decrease from each feature can be averaged across trees to determine the
final importance of the variable.
● One -hot encoder to convert service column.One-hot encoding is an
approach that we can follow if we want to convert such non-numeric (but
rather categorical) data into a usable format.

3.Topic Model (Subproblem) :-Here we will classify that whether network


intrusion has occurred or not and which type of attack has occured.The
Training of data will be done will be done through KNeighborsClassifier
Model,Logistic Regression Model,Gaussian Naive Baye Model,Decision Tree
Model ,SVM Model and iSVM Model(Base paper model).

4. Testing:- We will manually inject attack into existing packets using ID2T -
Intrusion Detection Dataset Toolkit And convert them to dataset using network
intrusion dataset creation framework.

5. Evaluation:- Models will be evaluated and the best performing model will
be decided.

Language/tools for implementation

Language used-
Python

Major Dependencies-

● numpy
● sklearn
● pandas
● matplotlib
● seaborn
Results:-

Self-evaluation of model
Normal_DoS Naive Bayes Classifier Model Evaluation

Model Accuracy:
0.9737686173767133

Normal_DoS Decision Tree Classifier Model Evaluation

Model Accuracy:
0.9999480272634127

Normal_DoS KNeighborsClassifier Model Evaluation

Model Accuracy:
0.9977577476500898

Normal_DoS LogisticRegression Model Evaluation

Model Accuracy:
0.980836909552589
Evaluation on test data:- (Models are evaluated on KDD test data
along with self-created test-data using INSecS-DCS)

Normal_DoS Naive Bayes Classifier Model Test Results

Model Accuracy:
0.8336536781408352

Normal_DoS Decision Tree Classifier Model Test Results

Model Accuracy:
0.8165880365775525

Normal_DoS KNeighborsClassifier Model Test Results

Model Accuracy:
0.8666200710583027

Normal_DoS LogisticRegression Model Test Results

Model Accuracy:
0.8418661541149747

Final RESULT:-

The KNeighborsClassifier Model performed best during our evaluation .Its accuracy
was above all the models.Therefore,it has been found out that this model is best for
Network Intrusion detection.
References:-
[2] C. Gao, J. Zeng, M. R. Lyu and I. King, "Online App Review Analysis for Identifying
Emerging Issues," 2018
IEEE/ACM 40th International Conference on Software Engineering (ICSE), Gothenburg,
2018, pp. 48-58, doi:
10.1145/3180155.3180218.
[3] Dandannavar, Padma. (2016). Application of Machine Learning Techniques to
Sentiment Analysis.
10.1109/ICATCCT.2016.7912076.
[4] Jagdale, Rajkumar & Shirsat, Vishal & Deshmukh, Sachin. (2019). Sentiment

You might also like