Research
Research
ABSTRACT
Intrusion Detection System (IDS) is a tool, or software application, that monitors network or system activity and
detects malicious activity occurring. The protected evolution of the network must incorporate new threats and
related approaches to avoid these threats. The key role of the IDS is to secure resources against the attacks. Several
approaches, methods and algorithms of the intrusion detection help to detect a plethora of attacks. The main
objective of this paper is to provide a complete system to detect intruding attacks using the Machine Learning
technique which identifies the unknown attacks using the past information gained from the known attacks. The
paper explains preprocessing techniques, model comparisons for training as well as testing, and evaluation
technique.
Keywords : Intrusion Detection System, Host, Network, Detection Techniques, Support vector machine, Machine
Learning, NIDS, HIDS.
CSEIT2062166 | Accepted : 01 May 2020 | Published : 05 May 2020 | May-June-2020 [ 6 (3) : 61-71 ]
61
Jayesh Zala et al Int J Sci Res CSE & IT, May-June-2020; 6 (3) : 61-71
network access and information system dependency. Intruders use a range of methods, including Password
According to the extent and complexity of current cracking, peer-to-peer attack, Sniffing Attack,
network protection risks, it would not be appropriate Eavesdropping, Application Layer Attack and more to
to discuss the usage of intrusion prevention, but the fix the aforementioned device vulnerabilities and
capabilities and skills of intrusion detection. Intrusion compromise sensitive systems. Therefore, the
arises from: network attackers, approved users of organization's private resources from the Internet and
systems who seek to achieve additional privileges for users inside the organization needed to be some form
which they are not allowed; approved users who of protection.
exploit privileges they have been granted.
Some of the common attacks against Intrusion
For the recognition and prevention of attack, Network detection system can be used are:
Intrusion Detection Systems (NIDS) use a Network or
Host-based approach. In each case, these products 2.1 Denial of Service (DoS): is an attack in which an
check for signatures (specific patterns) showing adversary directed a deluge of traffic requests to a
typically malicious or suspicious intent. A variety of system in order to make the computing or memory
algorithms to classify various kinds of network resource too busy or too full to handle legitimate
intrusions have been established, but the accuracy of requests and in the process, denies legitimate users
its findings is not verified by heuristic methods. If a access to a machine.
succinct output metric is available, an exact efficacy of 2.2 Probing Attack (Probe): probing network of
an intrusion detection program may be recorded to computers to gather information to be used to
identify malicious sources. compromise its security controls.
2.3 User to Root Attack (U2R): a class of exploit in
II. NEED OF AN INTRUSION DETECTION which the adversary starts out with access to a normal
SYSTEM user account on the system (gained either by sniffing
Now an internet day has become part of our everyday passwords, a dictionary attack, or social engineering)
life, the world of business is linked to the internet. and is able to exploit some vulnerability to gain root
Each day, many people link to the Internet to benefit access to the system.
from the modern business concept known as e- 2.4 Remote to Local Attack (R2L): occurs when an
business. Therefore, enhancing connectivity has attacker who has the ability to send packets to a
become very relevant in today's e-business. machine over a network but who does not have an
account on that machine exploits some vulnerability
Across the Internet, there are two business stages. The to gain local access as a user of that machine.
first point is that the Internet provides an excellent
business opportunity to attract customers, while still III. TYPES OF INTRUSION DETECTION SYSTEMS
posing major risks to the company. On the internet,
there are harmless and damaging users. Around the Intrusion detection systems of two types are available.
same time, a company makes its data system open to These are intrusion detection systems based on the
innocuous internet users. Malicious users or hackers network and intrusion detection system based on the
can often have access to the internal infrastructure of host.
the company for many reasons. There are software 3.1 A Network IDS (NIDS): A Network IDS present in
bugs, failure in administrative security and possibly a computer or system linked to a network segment of
leaving the system to the default configuration. an entity tracks and checks for ongoing attacks on that
segment of the network. Many different hacking Misuse detection is automatically generated and the
algorithms like MD5 are used to preserve file work is more complex and precise than it is performed
protection in the network. If the network-based IDS is manually. It should be submitted to the right
designed to learn an attack, it responds by sending authorities in compliance with the robustness and
administrators notices. NIDS searches for network seriousness of a signature that is triggered in the
traffic attack patterns such as large collections of network.
related items of some kind that could mean a denial of
service attack is continuing or searches for a pattern to 4.2 ANOMALY BASED INTRUSION DETECTION
substitute a sequence of related packets that might SYSTEM: An intrusion detection system for anomalies
indicate a port scan being performed. It can be used as provides a mechanism for the identification and
a monitor of the host computer on any part of a maltreatment of both network and computer
network or can be used to control all traffic between intrusions through the monitoring and classification of
the systems which make up the whole network from a device behaviour as normal or anomalous. The
specific point in the network (the router is one classification consists not of patterns or signatures, but
example). Nids are used from which you can see traffic of certain rules and seeks to detect malicious activities
going into and out of a particular network segment. of some kind that are not natural to operate the
program. In the case of signature-based systems,
3.2 Host-based Intrusion Detection System (HIDS): is attacks that had previously been produced can be
installed on a single device or server, known as a host, detected only. The advantages of this system are the
and tracks the operation of the host only. Host-based Possibility of identifying new attacks as intrusion;
intrusion detection technologies can also be divided inability to identify the triggers and properties of new
into two categories: signature-based (i.e. misuse attacks; decreased reliance on IDS (as opposed to
detection) detection and anomaly-based technology. attack-based signature-based); and the ability to detect
HIDS tracks key system file status and detects the misuse of user rights.
creation, alteration and deletion of the monitored files
by an intruder. The HIDS then triggers a warning to
change a file attribute, to build new files, or to remove
existing file.
helps us to detect various types of attacks. On proper information like source, destination, length, protocol,
monitoring of network traffic in real-time one can method, status code etc.
check if there are any dangerous payloads.
5.3.2 Preprocessing: The Pre-processing of data is the There are 41 features listed in the NSL KDD dataset.
conversion of raw data formats to usable and efficient Since it is not possible to take into account all 41
formats which will be passed over to the models for features, we have drawn a key graph. The
training and testing. The primary phase of pre- characteristic graph gives us a clear picture of all
processing performed here is mapping of the data characteristics and their variation. This is why we
entries in the data set with the segregated attack decided to drop the last 9 because the value of the
classes which are DoS, Probe, U2R and R2L and are latter is not changing. Some of the columns like
accomplished by the lambda function. srv_rerror_rate, urgent,is_host_login, su_attempted,
num_shells, and land have only 0 as their value. We,
Therefore, set D is further divided into subsets therefore, decided to put them out of the dataset
M1 = {m11, m12, m13, …, m1w}, M2 = {m21, m22, preprocessing. Then, by changing the number of
m23, …, m2x}, M3= {m31, m32, m33, …, m3y} and attributes from float16 and Int16 and then adding this
M4 = {m41, m42, m43, …, m4z} such that w + x + to a new numerical data frame, we extract and scale it
y + z = i. to have zero mean and the unit variance.
5.4.2 Naive Bayes Classifier: In machine learning, typically supervised classification methods, decision
naïve Bayes classifiers are a family of basic tree categories are above expectations
"probabilistic classifiers" focused on the interpretation
of Bayes 'theorem with clear (naïve) assumptions of Results of Decision Tree:
independence among the features. They are among the
simplest configurations of Bayesian network models.
Naïve Bayes has been widely researched since the
1960's. It was implemented in the text-retrieval group
in the early 1960s, and remains a popular method for
text categorization, the problem of judging documents
as belonging to one category or the other with word
frequencies as the features. In this domain, it is
competitive with more advanced methods with
adequate preprocessing, including support vector 5.4.4 Random Forest Classifier: Random forest, as its
machine. It is also used for automatic medical title suggests, is representation of a large number of
diagnosis. individual decision trees which act as an ensemble.
Every single tree in the random forest spits out a class
Results of Naive Bayes Classifier: prediction and the class with the most votes is the
prediction of our model.
5.4.3 Decision Tree: A decision tree is a flowchart-like The explanation for this wonderful effect is that trees
structure in which each internal node represents a shield each other from their individual errors. Because
"test" on an attribute (e.g., if a coin flip comes up heads some trees will be incorrect, several other trees would
or tails), each branch represents the test result, and be correct and the trees will move in the right
each leaf node represents a class mark (decision made direction as a group. So the prerequisites for successful
after all attributes have been computed). production of random forests are:
i. The features need to contain some real signal so that
Decision tree classifiers are used in various pattern models developed using those features do better than
identification issues such as image classification and random conjectures.
character recognition as well as a well-known ii.The predictions (and therefore the errors) made by
classification technique. Decision tree classifiers are the individual trees need to have low correlations with
more successful due to their high adaptability and each other.
computational efficiency, particularly for complex
classification problems. In addition to numerous
Results of Random Forest Classifier: there are several more complex extensions. The
logistic regression (or logit regression) is estimating
the parameters of a logistic model (a type of binary
regression) in regression analysis.
A useful strategy for both classification and regression may Let there be a set R = {r1, r2, r3, ..., rp} where
be to assign weights to neighboring inputs, so that the r1, r2 , r3 ,… , rp are the patterns recognised
nearby neighbors contribute more to the average than the by the model where every pattern will have
more distant ones. A typical weighting scheme, for distinct order and number of packets.
example, is to assign each neighbor a weight of 1/d, where
d is the distance from the neighbour.
5.5 Detection Engine: The “Detection Engine” works
in the real world environment to analyze the pattern
Neighbors are taken from a set of objects for which the
of packets and it’s content to recognize if there is any
class (for classification k-NN) or the value of the object
sort of attack. If a packet ends up matching any of the
property (for regression k-NN) is defined. This can be
rules, it notifies an appropriate alert message to the
called the training set for the algorithm, although no
security administrator.
specific training phase is required. A peculiarity of the k-
NN algorithm is that they are prone to the data's local
structure. Set R= {r1, r2, r3, ..., rp} contains various
Results of k-nearest neighbors: patterns which has ordered packets which
will be matched to the incoming packets P =
{p1, p2, p3, …, pn} and if the sequence of the
packets from set P are matched with any of
the pattern from set R then that particular
attack will be detected.
may disable the system by discarding packets so that how they are performed in a new set of data known as
they do not reach or close the ports. test data outside the sample
Alerts are logged to “Log Packet Analysis'' which sends Comparison Table:
the information out for logging in a lof-file format,
such as tcpdump and .pcap files. Network
Administrators can monitor these log files for further
inspection.
In the future, the number of tests for our system will [1]. C. Chang and C. J. Lin, LIBSVM, “A Library for
increase and we will find different accuracies. We Support Vector Machines”, the use of LIBSVM,
hope to improve the genetic algorithm to surge IDS 2009.
precision. The current system displays only log [2]. Rung-Ching Chen, Kai-Fan Cheng and Chia-
information but uses no techniques to analyze the Fen Hsieh, “Using Rough Set and Support Vector
information in the logs and to extract information. Machine for Network Intrusion Detection”,
Data mining techniques can be utilized to analyze the International Journal of Network Security & Its
information in log files to help efficient decision- Applications (IJNSA), Vol 1, No 1, 2009.
making to enhance the system. Only the known [3]. Need and study on existing Intrusion Detection
attacks are detected by the current system. In order to System. Available at:
gain knowledge by analyzing increasing traffic, and to http://www.sans.org/resources/idfaq.
learn new patterns of intrusion this can be extended [4]. Resources about packet capturing. Available at:
by integrating intelligence into them. http://www.netsearch.org/jpcap.
[5]. Salvatore Pontarelli, Giuseppe Bianchi, Simone
IX. CONCLUSION Teofili. Traffic-aware Design of a High Speed
FPGA Network Intrusion Detection System.
The IDS offers basic detection technology to safeguard Digital Object Identifier 10.1109/TC.2012.105,
network systems that are connected directly or IEEE TRANSACTIONS ON COMPUTERS.
indirectly to the Internet. But it's up to the network [6]. PrzemyslawKazienko&PiotrDorosz.IntrusionD
administrator finally at the end of the day to make sure etection Systems (IDS) Part I - (network
that the network remains safe. It doesn't protect the intrusions; attack symptoms; IDS tasks; and IDS
network completely from intruders, but IDS helps the architecture). www.windowsecurity.com ›
network administrator track bad people on the Articles & Tutorials
Internet whose very aim is to make your network [7]. Sailesh Kumar, “Survey of Current Network
infringe and vulnerable to attacks. After firewall Intrusion Detection Techniques”, available at
technology is deployed in the network perimeter, IDS http://www.cse.wustl.edu/~jain/cse571-
becomes the main part of many organizations. In case 07/ftp/ids.pdf.
of traffic not crossing the firewall, IDS can provide [8]. Dataset:
protection against outside users and internal attackers. https://www.unb.ca/cic/datasets/nsl.html
[9]. AHMAD, M. BASHERI, M. J. IQBAL, and A. [18]. “Types of Intrusion Detection System.” Online].
RAHIM, “Performance Comparison of Support Available: https://en.
Vector Machine, Random Forest, and Extreme wikipedia.org/wiki/Intrusion_detection_system
Learning Machine for Intrusion Detection.” [19]. K. A. I. PENG, V. C. M. LEUNG, and Q.
Online]. Available: HUANG, “Clustering Approach Based on Mini
0.1109/ACCESS.2018.2841987 Batch Kmeans for Intrusion Detection System
[10]. H. Nkiama, S. Z. M. Said, and M. Saidu, “A Over Big Data,” SPECIAL SECTION ON
Subset Feature Elimination Mechanism for CYBERPHYSICAL- SOCIAL COMPUTING
Intrusion Detection System,” (IJACSA) AND NETWORKING, , 2018. Online].
International Journal of Advanced Computer Available: 0.1109/ACCESS.2018.2810267
Science and Applications,, vol. Vol. 7, no. No. 4, [20]. H. su Chae and S. H. Choi, “Feature Selection for
2016. efficient Intrusion Detection using Attribute
[11]. “Sparsity-driven weighted ensemble classifier.” Ratio,” INTERNATIONAL JOURNAL OF
Online]. Available: COMPUTERS AND COMMUNICATIONS , vol.
https://arxiv.org/abs/1610.00270 Volume 8, 2014.
[12]. Prof.S.S.Manivannan and Dr.E.Sathiyamoorthy, [21]. SPECIAL SECTION ON CHALLENGES AND
“An Efficient and Ac-curate Intrusion Detection OPPORTUNITIES OF BIG DATA AGAINST
System to detect the Network Attack Groups CYBER CRIME, 2018. Online]. Available:
using the Layer wise Individual Feature 10.1109/ACCESS.2018.2854599
[13]. S. Revathi and D. A. Malathi, “A Detailed [22]. Vipin Das , Vijaya Pathak, Sattvik Sharma,
Analysis on NSL-KDD Dataset Using Various Sreevathsan, MVVNS.Srikanth, Gireesh Kumar
Machine Learning Techniques for Intrusion De- T,” NETWORK INTRUSION DETECTION
tection,” International Journal of Engineering SYSTEM BASED ON MACHINE LEARNING
Research & Technology (IJERT), vol. 2, no. 12, ALGORITHMS” International Journal of
2013. Computer Science & Information Technology
[14]. R. Vinayakumar, K. P. Soman, and P. (IJCSIT), Vol 2, No 6, pp 138-150, December
Poornachandran, “Applying con-volutional 2010.
neural network for network intrusion [23]. Majed Tabash, Tawfiq Barhoom,” An Approach
detection.” in ICACCI 2017, pp. 1222–1228. for Detecting and Preventing DoS Attacks in
[15]. K. S. Desale, C. N. Kumathekar, and A. P. LAN,” International Journal of Computer
Chavan, “Efficient Intrusion Detection System Trends and Technology (IJCTT) – Volume 18
using Stream Data Mining Classification Number 6, pp 265-27, Dec 2014.
Technique,,” in International Conference on Cite this article as :
Computing Communication Control and Jayesh Zala, Aditya Panchal, Advait Thakkar,
Automation,, 2015. Bhagirath Prajapati, Priyanka Puvar, "Intrusion
[16]. Q. Niyaz, M. Alam, W. Sun, and A. Y. Javaid, “A Detection System using Machine
Deep Learning Approach for Network Intrusion Learning", International Journal of Scientific Research
Detection System,,” in Conference Paper in in Computer Science, Engineering and Information
Security and Safety, 2015. Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 6,
[17]. “Mrutyunjaya Panda and Manas Ranjan Patra, Issue 3, pp.61-71, May-June-2020. Available at doi :
“Network Intrusion Detection using Naive https://doi.org/10.32628/CSEIT2062166
Bayes”,” International Journal of Computer Journal URL : http://ijsrcseit.com/CSEIT2062166