Intrusion Detection System Using Machine Learning
Intrusion Detection System Using Machine Learning
Abstract— The use of computers and the internet has spread rapidly over This is because attackers frequently try to break into systems
the course of the past few decades. Every day, more and more peopleare and act maliciously, such as stealing vital information from a
coming to rely heavily on the internet. When it comes to the field of
information security, the subject of security is one that is becoming an
corporation, rendering the systems useless, or even destroying
increasingly important focus. It is vital to design a powerful intrusion the systems. Internal attacks, such as pharming, distributed
detection system in order to prevent computer hackers and other intruders denial-of-service (DDoS), eavesdropping, and spear-phishing
from effectively getting into computer networks or systems. This can be attempts, are often among the most difficult to identify of all
accomplished by: (IDS). The danger and attack detection capabilities of the
computer system are built into the intrusion detection system. Abuse has
well-known attacks. This is due to the fact that firewalls and
occurred and can be used to identify invasions when there is a deviation intrusion detection systems (also known as IDSs) often guard
between a preset pattern of intrusion and an observed pattern of intrusion. against attacks from the outside. At this time, the majority of
An intrusion detection system (IDS) is a piece of hardware (or software) systems authenticate users by analysing a login pattern
that is used to generate reports for a Management Station as well as
monitor network and/or system activities for unethical behaviour or policy consisting of the user ID and password. As a result of this, we
violations. In the current study, an approach known as machine learning have proposed in this study a security solution that we have
is suggested as a possible paradigm for the development of a network dubbed the Internal Intrusion Detection and Protection System
intrusion detection system. The results of the experiment show that the (IIDPS) [1]. This solution recognises hostile or malicious
strategy that was suggested improves the capability of intrusion detection.
behaviour carried out against a system at the System call level.
Keywords- Support vector machine, Machine Learning, IIDPS uses data mining and forensic profiling techniques in
Network Intrusion Detection System, Host Intrusion Detection order to mine system call patterns, also known as SC-patterns,
System, Intrusion Prevention System, Intrusion Detection System, which are the longest system call sequences (SC-sequences)
Host, Network, Intrusion Detection System. that have repeatedly appeared numerous times in a user's log
file for the user. SC-patterns can be used to identify malicious
I. INTRODUCTION activity. The user's computer usage history is used to compile
Over the past few years, there has been an increase in the usage the user's forensic features, which are then defined as a SC-
of computer systems to make the lives of consumers easier and pattern that commonly appears in the user's own submitted SC-
more convenient. When people try to take advantage of the sequences but is rarely utilised by other users. This information
amazing capabilities and processing capacity of computer is gleaned from the user's computer.
systems, however, security has been one of the most significant
problems in the field of computer science.
Authorized licensed use limited to: FLORIDA INTERNATIONAL UNIVERSITY. Downloaded on August 18,2023 at 18:36:02 UTC from IEEE Xplore. Restrictions apply.
2023 International Conference on Computer Communication and Informatics (ICCCI ), Jan. 23 – 25, 2023, Coimbatore, INDIA
II. LITERATURE SURVEY. they are used to monitor system performance and give
notifications when there is any unexpected behaviour, the
The Internet of Things (IoT) is playing a crucial role administrator of the system needs to respond immediately to
in the world of online technology by providing the general these warnings as soon as they are received [5].
public with services that are both speedy and intelligent. The This work presented [6] a statistical Nave Bayesian
Internet of Things (IoT) and any relevant devices need to be approach that will be used in Intrusion Detection Systems(IDS)
protected in order to fulfil the aim of ensuring security. Based systems in a variety of circumstances, such as analysing HTTP
on this research, a new intrusion detection approach was service-based traffic and identifying HTTP normal connections
proposed for the purpose of enabling secured communication in and attacks. Specifically, this work was done in response to the
wireless environments to provide security for the internet of need for an improved method for detecting HTTP attacks.
things (IoT). This research offered a new strategy to feature However, in order to determine which statistical approach is
selection that combines the Conditional Random Field (CRF) the most effective and efficient in identifying various types of
and spider monkey optimization in order to locate the most attacks, a comparison study between them based on
helpful features from the dataset. The CRF is an acronym for performance parameters will be examined. This will allow for
the Conditional Random Field (SMO) [2]. The IDS is produced the determination of which statistical approach is the most
when the process of selecting features and the process of effective and efficient.Intrusion Detection System (IDS) [7, 8,
classifying data are combined. In order to cut down on the size 9] is utilised in this research to introduce and discuss the
of the dataset by employing just the features that are really Multivariate Statistical Analysis (MSA) method through the use
necessary, it is imperative that these characteristics be of Naive Bayesian Filtering and Multivariate Statistical
prioritised during the feature selection process. During the Analysis (MSA). This plan aims to construct an advanced
operation for classifying the data, the reduced dataset needs to intrusion detection system (IDS) with improved efficiency to
be split into two distinct groups, depending on the specific cut down on the number of false alarms that are generated.
condition. These categories may be "normal" and "abnormal," Cutting down on both false positive and false negative alarms
or they could be "attack." In this particular instance, the CRF will improve network security and increase the rate at which
is utilised to make the initial selection of the contributing attacks are detected [10].
characteristics [3]. After that, the SMO is used to finalise the An intrusion detection system (IDS) is a piece of
useable features that were extracted from the reduced features software that monitors a network for any malicious or
dataset. In addition, the CNN is used to differentiate between unauthorised activity that could potentially breach security
attack scenarios and normal conditions within the dataset. standards pertaining to the system's confidentiality, integrity,
Every organization's network infrastructure is always at risk and availability. As part of this thesis, we conducted in-depth
from a wide variety of threats, including infiltration, zero-day, literature evaluations on the many different types of intrusion
malware, and security breaches. These threats are made detection systems (IDS) [11], anomaly detection techniques,
possible by the widespread use of the internet and smart and machine learning algorithms that are available for use in the
devices. As a consequence of this, the traffic on the network detection and categorization of intrusions. The construction of
requires constant surveillance by an intrusion detection system a hybrid intrusion detection system (IDS) employing machine
(NIDS). The findings of this study indicate a novel hybrid learning methods is suggested as a result of this. By
approach to the classification of attacks and the detection of implementing appropriate machine learning algorithms into the
intrusions. When it comes to handling high false-positive and detection systems that are currently in use, it is possible to
low false-negative rates for attack categorization and intrusion achieve improvements in attack detection and categorization. In
detection, the solution that has been offered consists of three addition to this, they have made an attempt to compare the
parts. In the first step of the process, the dataset is pre- various machine learning algorithms by puttingthem to the test
processed by employing the min-max approach as well as the in a simulated environment so that they may evaluate how
data transformation methodology. Second, the random forest effective each of the algorithms is imperative that these
recursive feature removal method is utilised in order to discover production facilities have adequate protection against
the best features that improve the overall performance of the cyberattacks. Because of the daily rise in attacks, ensuring the
model. After that, we use the Adaptive Neuro Fuzzy System network's safety should be the top priority for any industrial
(ANFIS) [4] along with various different kinds of Support control system. Software is an essential component of control
Vector Machines (SVM) to classify probe, U2R, R2U, and systems for industrial environments.
DDOS attacks in order to detect infiltration. The validation rate As a result of the vulnerabilities, the exchange of
of the proposed method was computed with the assistance of communication must now involve cybersecurity precautions.
Fine Gaussian Support Vector Machines(FGSVM), and it was The security mechanisms that are in place for secure
found to be 99.3 percent for the binary class. As a result of communication need to be improved so that they can keep up
recent attacks on computer networks, data security has become with the ever-evolving threats to information security, in
the most important and critical component of all organisational addition to the security measures that are in place to combat
data systems. This has aneffect on the international monetary such threats. Based on the findings of this study, the
system. The Intrusion Detection System is the system that is construction of a versatile and dependable network intrusion
utilised the most in order to address concerns regarding
networks (IDS). Because
Authorized licensed use limited to: FLORIDA INTERNATIONAL UNIVERSITY. Downloaded on August 18,2023 at 18:36:02 UTC from IEEE Xplore. Restrictions apply.
2023 International Conference on Computer Communication and Informatics (ICCCI ), Jan. 23 – 25, 2023, Coimbatore, INDIA
Authorized licensed use limited to: FLORIDA INTERNATIONAL UNIVERSITY. Downloaded on August 18,2023 at 18:36:02 UTC from IEEE Xplore. Restrictions apply.
2023 International Conference on Computer Communication and Informatics (ICCCI ), Jan. 23 – 25, 2023, Coimbatore, INDIA
learning. Nevertheless, there were a lot of issues with this by other users. This information is gleaned from the user's
dataset. The fact that the KDD dataset contains a significant computer.
number of records that are identical to others is, to begin, one REFERENCES
of its primary drawbacks. 78% of the records in the training
[1]. D.P. Gaikwad and Ravindra C. Thool. (2015). Intrusion detect ion
set have been duplicated, and approximately 75% of the total system using bagging with part ial decision tree base classifier.
number of records in the testing dataset has been duplicated; as Procedia Computer Science 49 (pp. 92-98). Elsevier.) .
a result, our findings translate to biassed learning methods. The [2]. A. Yulianto, P. Sukarno, and N. A. Suwastika, “Improving AdaBoost-
NSL KDD dataset, which now makes use of the dataset for use based Intrusion Detection System (IDS) Performance on CIC IDS
2017 Dataset,” J. Phys. Conf. Ser., vol. 1192, no. 1, 2019, doi:
in machine learning algorithms, is accessible as a public data 10.1088/1742-6596/1192/1/012018.
set for the network intrusion detection system. Even if there [3]. A. H. L. and A. A. G. Iman Sharafaldin, “Toward Generating a New
may potentially be a new version of the KDD Cup 99 dataset, Intrusion Detection Dataset and Intrusion Traffic Characterization,”
this dataset is used in the NSL KDD dataset. There is no Proc. 4th Int. Conf. Inf. Syst. Secur. Priv., no. Cic, pp. 108–116,
2018.
duplication of data in the new NSL KDD Test and Train dataset, [4]. V. Hajisalem and S. Babaie, “A hybrid intrusion detection system
which was created by combining only the most relevant based on ABC-AFS algorithm for misuse and anomaly detection,”
information from the original KDD dataset. As a direct Comput. Networks, vol. 136, pp. 37–50, 2018.
consequence of this, the findings of the research evaluation [5]. A. A. Aburomman and M. B. I. Reaz, “A survey of intrusion detection
systems based on ensemble and hybrid classifiers,” Comput. Secur.,
have been deemed to be a standard dataset that is consistent vol. 65, pp. 135–152, 2017.
across all studies. The training dataset is comprised of four [6]. Su-Yun Wua, E. Y. “Data mining-based intrusion detectors”,
distinct kinds of assaults, as a general rule. First is the Denial of ELSEVIER, 2009.
Service Attack (DoS), then comes the Probe Attack, then comes [7]. Akhilesh Kumar Shrivas, Amit Kumar Dewangan. An Ensemble
Model for Classification of Attacks with Feature Selection based
what is known as the User to Root Attack (U2R), and last comes on KDD99 and NSL-KDD Data Set. International Journal of
the Root to Local Root Attack (R2L). This, in turn, is made up computer applications. Volume 99, No.15, 2014.
of more than 21 different assaults. [8]. Yung-Tsung Hou, Y. C.-S.-M, “Malicious web content detection by
machine learning”, ELSEVIER, 2010.
V. CONCLUSION [9]. P. Maniriho, “Detecting Intrusions in Computer Network Traffic with
Machine Learning Approaches Detecting Intrusions in Computer
As a consequence of this, within this work, we suggest asecurity Network Traffic with Machine Learning Approaches,” no. April,
system that we refer to as the Internal Intrusion Detection and 2020, doi: 10.22266/ijies2020.0630.39.
Protection System (IIDPS). This system is capable of [10]. K. M. Sudar, P. Nagaraj, P. Deepalakshmi and P. Chinnasamy,
"Analysis of Intruder Detection in Big Data Analytics," 2021
identifying hostile behaviour that is aimed towardsa system at International Conference on Computer Communication and
the SC level. The IIDPS mines system call patterns, also known Informatics (ICCCI), 2021, pp. 1-5, doi:
as SC-patterns, which are defined as the longest system call 10.1109/ICCCI50826.2021.9402402.
sequences (System Call-sequences) that have repeatedly [11]. K. M. Sudar, M. Beulah, P. Deepalakshmi, P. Nagaraj and P.
Chinnasamy, "Detection of Distributed Denial of Service Attacks
appeared numerous times in a user's log file for the user. These in SDN using Machine learning techniques," 2021 International
are the operations carried out by the user, such as sending a file, Conference on Computer Communication and Informatics (ICCCI),
updating a file, or viewing a file, and they are validated by an 2021, pp. 1-5, doi: 10.1109/ICCCI50826.2021.9402517.
administrator. The user's computer usage history is used to [12]. Praveena, V., Vijayaraj, A., Chinnasamy, P., Ali, I., Alroobaea, R.
compile the user's forensic features, which are then defined as et al. (2022). Optimal Deep Reinforcement Learning for Intrusion
a SC-pattern that commonly appearsin the user's own submitted Detection in UAVs. CMC-Computers, Materials & Continua, 70(2),
2639–2653.
SC-sequences but is rarely utilised
Authorized licensed use limited to: FLORIDA INTERNATIONAL UNIVERSITY. Downloaded on August 18,2023 at 18:36:02 UTC from IEEE Xplore. Restrictions apply.