Smart Intrusion Detection System Compris
Smart Intrusion Detection System Compris
Index Terms — Intrusion Detection System, CART, Random II. RELATED WORK
Forest, Naive Bayes, Back-Propagation based MLP. A lot of studies have already been carried out about various
machine learning and deep learning methods and its
application in different fields. Same is not yet exhaustively
I. INTRODUCTION
done in the field of information security. The gap found in the
The Internet has become an important media of regular field of IDS was studied with available resources till today.
correspondence through online media collaboration, email, e- The unique idea of applying the machine learning and deep
learning, and so forth. Additionally, little and large learning methods in the IDS is a new theme in the
organizations have expanded their purchaser base by giving contemporary research arena.
direct client showcasing, web shopping and inter organization
correspondence utilizing essential web correspondence. With A. Study Regarding IDS
the gigantic development of computer network, the total According to each of the detection approaches, IDS
system experiences security weaknesses which are frameworks are separated into two classifications, which are
troublesome and exorbitant to be solved by manufactures [1]. anomaly-based detection and misuse based detection [8], [9].
A few dangers are brought through utilization of incapable Misuse-based IDS can recognize known assaults effectively
and wasteful security instruments welcoming intrusions from yet neglects to discover new assaults which fail to embody
internet hackers [2]. In this way, it is clear that the prevention the rules in the database [10]. In this manner, a database must
technologies set up like malware evacuation programs, be persistently refreshed to store the signature of each assault
antivirus projects and firewalls, neglect to give outright that is known. This IDS type is clearly incapable to identify
new attacks except it is trained [11]. Anomaly based IDS can
Published on October 8, 2020. Sajjad Waheed, Professor Dr., Mawlana Bhashani Science and
Shah Md. Istiaque, Bangladesh University of Professionals, Bangladesh. Technology University, Bangladesh.
(e-mail: [email protected])
Asif Iqbal Khan, Mawlana Bhashani Science and Technology University,
Bangladesh.
assemble a typical conduct model and recognizes any detection. Shilpaet Al [26] used fundamental element
significant deviations from the model similar to an evaluation on NSLKDD dataset for feature selection and
interruption. This IDS type can identify new assaults or dimension pruning approach for evaluation on anomaly
obscure one. However, it includes a high pace of false alarms detection. In general, network intrusion detection has been
[12], [13]. broadly improved by applying data mining and machine
learning technique, which has largely utilized individual
B. Study Regarding Dataset
conduct patterns from the community site visitors’ data.
The most significant challenge in assault identification Support Vector Machine (SVM) is used, as a method in a
framework is whether to produce genuine system traffic or to study, to evaluate IDS [27]. Among various approaches of
utilize the accessible benchmark datasets. There is criticism IDS, SVM acts as a classifier with false alarm and detection
about the use of datasets acquired from genuine system traffic rate as a measure of performance. Authors in a study [28] used
as it makes greater uncertainty and there is no such Markov Chain implementation as classifier and Apriori
methodology that obviously discloses how to precisely algorithm to remove isolated data from the database and also
separate between ordinary system traffic and attack traffic. used to judge the performance of NIDS. K-Means, an
This is the explanation behind utilizing the benchmark unsupervised algorithm, is used for classification, defines an
datasets for executing different attack discovery framework unlabeled class to which the clustering is performed.
of this paper. The available attack datasets [14]-[17] are
DARPA 1998, KDD Cup99, NSL KDD, UNSW NB15, etc.
The DARPA 1998, KDD Cup99, and NSL KDD consists of III. PRELIMINARIES
42 attributes including the class label. The UNSW NB15
This section provides a brief background about network
dataset consists of 48 attributes including the class label.
intrusion, and the four intelligent algorithm used in this study.
C. Review Regarding Detection
A. Concept of Network Intrusion
Multiple detection methods have been carried out in
Modern technology has broken the border of digital
various literatures. It includes traditional detection, ML-
intrusion and also digital threat. Attack in Estonia, Iran’s
based and DL Neural-network based detection. In few
nuclear power plant, digital espionage, financial damage-all
research hybrid method is also used. Various detection
of these are the newest threat of modern internet technology.
techniques are analyzed in the following discussion.
Digital intrusion is the first step and the most common type
D. Traditional Detection of attack or threat [29]. Then onward malwares are injected
A sandbox, in computer security, is a security component or further important arsenals are used. Thereby, if intrusions
wherein a different, confined condition is made and in which are monitored and checked then first line defense can possibly
several functions are restricted [18]. A sandbox is regularly be achieved.
utilized when untested code or entrusted programs from
outsider sources are being utilized. Sandbox also has few
1. Reconnaissance
constrain. Some sandbox apparatuses just deal with explicit
sorts of PDF assaults like MD Scan for Java Script, [19]
Nozzle for heap spraying [20], or it only records dynamic 7. Obfuscation 2. Scanning and
and Exfiltration Weaponization
behavior of a system and still requires manual analysis to
detect as in the case of CW Sandbox [21].
Huaibin Wang, HaiyunZhou, Chundong Wanghas
discussed about VM-based different IDSs [22]. They have
recommended to deploy VM-based numerous IDSs in each
layer to observe specific virtual component. Additionally, 6. Action on 3. Access and
they have also proposed the cloud alliance view, by the objective Escalation
D. Classification and Regression Tree means one particular feature does not affect the other.
Leo Bremen introduced the term CART. CART refers Therefore, this technique is called naïve [32].
Decision Tree algorithm. It is used for classification or G. Multi-Layer Perception
regression predictive modeling problems. Classically, this
Multi-layer perceptron is a deep learning technique where
algorithm is mentioned as “decision trees”. However, they
more than one linear layer (combination of neurons) is
are also mentioned by the more modern term CART, on some
involved. In a three-layered network, first layer will be
platforms like R [30].
the input layer and last one will be output layer and with a
hidden layer in between [33].
Height > 175cm
Yes No
Weight >75kg
Input Output
Male Yes No
Male Female
Input First Second Output
Layer Hidden Output Layer
Fig. 2: Classification and Regression Tree. Layer Layer
100 100
95 95
90 90
85
85
80
80
75
75
Normal
Normal
Normal
Normal
Intrusion
Intrusion
Intrusion
Intrusion
Normal
Normal
Normal
Normal
Intrusion
Intrusion
Intrusion
Intrusion
Accuracy
Accuracy
Random CART Navie Bayes MLP
Random Forest CART Navie Bayes MLP Forest
Fig. 5. Comparison among the ML and DL Methods for finding out Fig. 6. Comparison among the ML and DL Methods for finding out
Accuracy using Generic Features. Accuracy using Selective Features
B. Application of ML and DL Methods with Selective TABLE III: COMPARISON OF ACCURACY AMONG ML AND DL METHODS
Features USING GENERIC AND SELECTIVE FEATURES
Repeatedly in the second experiment, four methods are Type of Type of Accuracy Generic Accuracy Selective
Algorithm Data Features Features
applied to find out accuracy in both normal flow of data and
Random Normal 85.387 90.903
also for intrusion. But in this case selective features are applied
Forest Intrusion 98.547 98.266
in four different ML Methods like Random Forest, CART,
Normal 99.086 98.246
Naive Bayes, & MLP. Here, CART and Naive Bayes has CART
Intrusion 96.51 93.167
provided a better accuracy although Random Forest has also
Normal 85.606 98.331
provided a better intrusion detection like previous which is Naive Bayes
Intrusion 93.265 93.458
98.266%. Performance of MLP has also been displayed a
Normal 93.387 95.007
significant improvement. In the Fig. 6 graphical presentation MLP
Intrusion 94.312 93.652
has also displayed the overall performance.
TABLE II: TEST ACCURACY FOR NORMAL FLOW OF DATA AND INTRUSION Type of Algorithm
DETECTION USING SELECTIVE 15 FEATURES
105
Type of Algorithm Type of Data Accuracy 100
Normal 90.903 95
Random Forest
Intrusion 98.266 90
Normal 98.246 85
CART
Intrusion 93.167 80
Normal 98.331 75
Normal
Normal
Normal
Normal
Intrusion
Intrusion
Intrusion
Intrusion
Naive Bayes
Intrusion 93.458
Accuracy
Normal 95.007
MLP
Intrusion 93.652 Random CART Naive Bayes MLP
Forest
C. Analytical Review Class Name Accuracy with 41 Feature
Experimental results in both cases have displayed Class Name Accuracy with 15 Features
reasonably good performance. Use of selective features and Fig. 7. Graphical Representation and Comparison of Accuracy among ML
Methods using Generic and Selective Features.
elimination of few less important parameters have also
improved the overall performance. After analyzing overall
results, Classification and Regression Tree is found to be a D. Proposed IDS Model
stable and better method keeping in mind that Random Forest A model consisting ML and DL method is proposed in Fig.
provided the best intrusion detection in both cases. 8. Here MLP with Back Propagation algorithm is used and
Random Forest is taken as ML method. The selected methods
are used considering the performance in accuracy.
[29] Clarence Chio and David Freeman, ''Machine Learning and Security,''
O’REILLY, P.6.
[30] https://machinelearningmastery.com/classification-and-regression-
trees-for-machine-learning/ Accessed on 25 Aug 2020.
[31] https://towardsdatascience.com/ understanding-random-forest-
58381e0602d2Accessed on 25 Aug 2020.
[32] https://towardsdatascience.com/ naive-bayes-classifier-81d512f50a7c.
[33] https://medium.com/@xzz201920/multi-layer-perceptron-mlp-
4e5c020fd28aAccessed on 25 Aug 2020.