0% found this document useful (0 votes)
50 views9 pages

2018 8159 1 PB

Uploaded by

Durga Devi P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views9 pages

2018 8159 1 PB

Uploaded by

Durga Devi P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

JOIV : Int. J. Inform.

Visualization, 8(3) - September 2024 1230-1238


INTERNATIONAL
JOURNAL ON
INFORMATICS
VISUALIZATION

INTERNATIONAL JOURNAL
ON INFORMATICS VISUALIZATION
journal homepage : www.joiv.org/index.php/joiv

Collaborative Intrusion Detection System with Snort Machine


Learning Plugin
Dimas Febriyan Priambodo a,*, Achmad Husein Noor Faizi b, Fika Dwi Rahmawati b,
Septia Ulfa Sunaringtyas a, Jeckson Sidabutar a, Tiyas Yulita a
a
Cybersecurity Engineering, National Cyber and Crypto Polytechnic, Ciseeng, Bogor, Indonesia
b
National Cyber and Crypto Agency, Bojongsari., Depok, Indonesia
Corresponding author: *[email protected]

Abstract—The increasing prevalence of cybercrime and cyber-attacks underscores the imperative need for organizations to implement
robust network security measures. Nevertheless, current Intrusion Detection Systems (IDS) often rely on single-sensor or multi-sensor
in the same type of IDS, including Host-Based IDS (HIDS) or Network-Based IDS (NIDS), which inherently possess limited detection
capabilities. To address this limitation, this research combines NIDS and HIDS components into a collaborative-IDS system, thus
expanding the scope of intrusion detection and enhancing the efficacy of the established attack mitigation system. However, the
integration of NIDS and HIDS introduces formidable challenges, notably the elevated rates of False Positive and False Negative alerts.
To surmount these challenges, the researcher employs machine learning techniques in the form of Snort plugins and comparison
methods to heighten the precision of attack detection. The obtained results unequivocally illustrate the effectiveness of this approach.
Using a Support Vector Machine for static analysis of the NSL-KDD dataset attains an outstanding 99% detection rate for Denial of
Service (DoS) attacks and an impressive 98% detection rate for Probe attacks. Furthermore, in dynamic real-time attack simulations,
the machine learning plugins exhibit remarkable proficiency in detecting various types of DoS attacks, concurrently offering more
comprehensive identification of SYN Flooding DoS attacks compared to the Snort community rules set. These findings signify a
significant advancement in intrusion detection, paving the way for more robust and accurate network security systems in an era of
escalating cyber threats.

Keywords— Artificial intelligence; machine learning; NIDS; HIDS; snort plugin.

Manuscript received 5 Agt. 2023; revised 25 Dec. 2023; accepted 19 Jan. 2024. Date of publication 30 Sep. 2024.
International Journal on Informatics Visualization is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.

labeled data comprising both training and test data, making it


I. INTRODUCTION a complex dataset [2]. This aligns with the trends outlined by
Based on its scope, intrusion detection is divided into the [3], employs the second most widely used dataset. Flask is
Network-Based Intrusion Detection System (NIDS) and the used to predict the dataset and visualize system packet
Host-Based Intrusion Detection System (HIDS). The NIDS is detection results. After Flask successfully predicts the dataset,
designed to observe the traffic within specific network researchers use Flask to become the main page for predictions
segments or devices and analyze network protocols, transport, of real-time network traffic that has been reconstructed and
and applications. Its primary purpose is to detect abnormal or analyzed using a homemade custom plugin.
suspicious activity within the targeted network segments or Previous research from [4] indicates that Snort requires
devices. HIDS operates by monitoring the attributes of a host fewer computing resources to handle 10 Gbps network traffic
system and capturing events transpiring within it to identify than Suricata. Suricata can process a larger volume of packets
any potentially suspicious or anomalous activity [1] . per second than Snort. However, both show a relatively high
In supervised machine learning, a dataset is necessary to rate of false positives alarms. Although Snort is regarded as
train and test the built attack mitigation system to achieve an exceptional IDS in various aspects, it still exhibits a
detection accuracy. This study focuses explicitly on DoS significant occurrence of false positive alarms. Machine
attack data and attack probes. Consequently, the NSL-KDD learning techniques were introduced to address this issue, and
Dataset is chosen for this research due to its inclusion of adaptive plug-ins were developed to operate alongside the

1230
Snort rule set. Research [5] on open-source IDS (Snort, including modules for monitoring file systems, reading
Suricata, Zeek, and OSSEC) reveals that open-source IDSs messages from logs, collecting inventory data, scanning
exhibit enhanced efficiency in detection accuracy and slightly system configurations, looking for malware, etc. Users can
reduced false positive alarms. activate or deactivate the module through the configuration
Implementation from research [6] suggests the settings found on the Wazuh Server. The agent communicates
implementation of a machine learning-based behavior for with the server to transmit data and collect security-related
analyzing HTTP data. The proposed framework aims to events. In addition, agents can transmit operational data,
identify malicious activities within the HTTP protocol using report configuration, and agent status. Once connected, agents
Suricata sensor data. To optimize performance and accuracy, can be upgraded, monitored, and configured remotely via a
this system uses a supervised machine-learning technique in server [19]. The data visualization process can be supported
the form of a decision tree. The system aims to detect and simplified through an agent connected to the server.
malicious activity by analyzing attack patterns and behavior, Wazuh requires a visualization component, namely the Elastic
relying on the signs or traces produced during an attack. Stack, to be included in Wazuh's installation requirements.
Research [7] presents a framework for a Host-based Intrusion Elastic Stack, as described previously, is a combination of
Detection System (HIDS) that aims to detect intrusive system three applications [20]. Filebeat is a lightweight forwarding
processes by analyzing the frequency of n-gram terms in application that transmits logs over a network to Elasticsearch.
system call trace files. The proposed framework begins by It can extract output from the Wazuh analytics engine and
converting the system call trace files into a model securely send events and alerts in real-time via an encrypted
representing n-gram vectors. Subsequently, a dimension channel. Elasticsearch, on the other hand, is a highly
reduction process is applied to reduce the size of the input adaptable search and analytics engine that excels in handling
vector by selecting only those n-gram terms with frequencies full-text data. It operates distributed, partitioning index data
exceeding a predetermined threshold. The reduced vectors are into shards, each potentially having multiple replicas. Kibana
then processed by various classifiers (Naive Bayes, SVM, is a flexible, user-friendly web interface that facilitates data
MLP, and C4.5 Decision Tree) to determine whether the collection, analysis, and visualization. This application
corresponding system call trace files are classified as usual or seamlessly operates on top of the indexed content within the
malicious. The experimental findings demonstrate that the Elasticsearch cluster.
proposed framework effectively identifies regular and
intrusive system processes while minimizing the B. Snort
computational overhead typically associated with HIDS. The Snort is a NIDS known for its real-time network traffic
collaborative ones have been implemented by [8] in IoT and analysis and packet recording capabilities on IP networks. It
reveal results in more reliable and effective security. offers many functionalities, including protocol analysis,
Opposite with research from [9]–[12] purpose with neural content search, and matching. Snort is also proficient in
networks and research by [13] using random forests also detecting multiple types of attacks and attempts, such as port
researched by [14] using reinforced learning, this research sweeps, buffer overflows, stealth port scans, Common
finds support vector machines to be the best result. Previous Gateway Interface (CGI) attacks, server message block
research has primarily focused on improving accuracy by (SMB) protocol probes, and OS fingerprinting [21]. In its
utilizing multiple sensors [15], [16]. However, this research implementation, Snort is divided into multiple components,
takes a different approach by implementing a multi-intrusion each responsible for detecting particular types of attacks and
detection system in a prototype by combining NIDS and producing the desired output format from the detection system
HIDS. Additionally, it utilizes Flask to display real-time [22].
predictions of incoming packet flows and employs the Elastic Each detection function in Snort is based on a specified
stack as a centralized log management dashboard. The best- ruleset. If a ruleset is not defined for a particular attack type,
performing algorithm in the training model, Support Vector Snort cannot detect that specific attack. Rulesets ensure that
Machine (SVM), is implemented as a Snort plugin to enable Snort's detection capabilities are aligned with the specified
further development. attack signatures and patterns. In this research, a snort
machine learning plugin was developed, where, in general,
II. MATERIALS AND METHOD snort is ruleset-based and does not use machine learning.

A. Wazuh C. Machine Learning (ML)


Wazuh is an open-source based tool that functions as a Machine Learning (ML) is a subset of Artificial
HIDS (endpoint) [17]. Wazuh can perform log analysis, file Intelligence (AI) that involves configuring systems to acquire
integrity checks, intrusion and malware detection, Windows knowledge and emulate human thinking processes [23]. ML
registry monitoring, rootkit detection, configuration techniques aim to replicate cognitive processes like logical
assessment, vulnerability detection, and time-based alerts, reasoning and intuition, drawing insights from past
and has an active response to attacks. Wazuh also can protect experiences, conducting experiments, and making
cloud, container, and server workloads. To support this generalizations [4]. The system receives initial training,
capability, Wazuh requires three core components [18] as during which the algorithm learns from data and its
agent, server and elastic stack. classifications. The system's ultimate objective is to make
Each component in Wazuh has functions and modules that decisions in the future autonomously. ML operates with a
have been provided. Wazuh Agent has an architecture where degree of probability, leveraging analyzed data and adapted
different components of each agent can handle their tasks, decisions. Prediction lies at the core of ML, enabling the

1231
anticipation of future events based on past occurrences. D. Method
Currently, there exist two main types of ML [24]. Supervised The experimental method is employed in this study,
Machine Learning is a category of ML that operates with drawing inspiration from NIST SP 800-94 and adapting it to
datasets containing input and corresponding output. Through suit the requirements of utilizing machine learning in the
this approach, supervised ML constructs algorithms that system. The research design, depicted in Fig.1, involves using
represent the dataset. On the other hand, Unsupervised three machine-learning algorithms applied to the NSL-KDD
Machine Learning does not involve a training phase with dataset. These algorithms are compared to Snort, aiming to
datasets. In this model, the system makes decisions based on identify the most effective machine learning approach for
its configured structure. real-time detection of Denial of Service (DoS) and Probe
Much research related to IDS uses machine learning. Based attacks. Additionally, a platform for detecting DoS and Probe
on another research from [25]. Supervised machine learning attacks in real-time is designed using Flask. The outcomes of
algorithms focused on classification encompass Support this phase will be utilized to construct an attack mitigation
Vector Machine (SVM), which shares similarities with the system with a high degree of detection accuracy.
traditional multilayer perceptron neural network. Another
algorithm, K-means, addresses the widely recognized A. Needs Analysis
clustering problem and represents a straightforward, The collaborative-IDS system consists of three
unsupervised learning approach. Neural Networks (NN) can virtualization computer devices, including multi-IDS Hosts,
handle multiple regression and classification tasks Wazuh Agent Hosts, and Hosts that play a role as an attacker,
simultaneously, although typically, each network specializes as depicted in Table I. The Attacker Host uses a virtualized
in a single task. Ubuntu operating system to support installing and
The primary focus of this study involves analyzing data implementing attack scenarios against multi-IDS servers to
associated with Denial of Service (DoS) attacks and Probe obtain attack logs based on the ruleset applied to NIDS Snort.
attacks. Hence, the NSL-KDD Dataset is chosen for this The Wazuh Agent host is built using virtualized Windows 10
investigation because it provides labeled training and test data, to collect activity logs on this host so that multi-IDS systems
making it more intricate [7]. The combined attack data get complex event-based activity data. Creating a separate
comprised 125,973 attacks in the training dataset and 22,544 Wazuh Agent host supports more exploration, and all
attacks in the test dataset. activities or events carried out on this host do not impact
multi-IDS servers.
TABLE I
HARDWARE DESIGN
Multi- Attacker Wazuh-
Specification Main Host
IDS Host Agent
Operating Windows Ubuntu Ubuntu Windows
System 10 Pro 20.04 20.04 10 Pro
RAM 16 GB 8 GB 2 GB 2 GB
Disk Storage 1 TB 128 GB 64 GB 64 GB

B. Collaborative IDS Integration


This stage consists of installing several applications,
including Wazuh and Elasticsearch. The installation of
Elasticsearch needs to be configured based on the user and the
role of the user who can use Elasticsearch. After the addition
of the plugin is successful, to connect between Wazuh and
Elasticsearch, which will use the HTTPS (Hypertext Transfer
Protocol Secure) protocol, it is necessary to register the
Wazuh SSL certificate on Elasticsearch. Elasticsearch needs
to be configured so that it can communicate with the Wazuh
server node, so it needs to specify the network host, node
name, and cluster
Log data stored on the Wazuh server needs to be visualized
in Elasticsearch; to do this task, it is necessary to use Filebeat
and add a template to be able to read the alerts generated by
Wazuh optimally and add the Wazuh module to Filebeat so
that the log data can be read. Filebeat must also use an SSL
certificate to connect with the https protocol. Kibana is used
to visualize and manage data entry on Elasticsearch. After the
installation, a folder is created to store Kibana data and
installs the Wazuh plugin and SSL certificate so that Kibana
can read data from Wazuh.
Snort requires several additional dependencies outside the
Fig. 1 Research Prediction Stage
Ubuntu repository and also Tcmalloc to optimize memory
allocation on the host where Snort is installed and set up the

1232
network interface that Snort will use to listen to any network output will be a sparse matrix where each column represents
traffic that occurs in it. Community rules are the basic ruleset a possible value. It is assumed that the input attribute takes
that will be used as a reference for Snort in detecting incoming values within the range of [0, n_values] [27]. Hence, to assign
attacks and require an OpenAppID application that can be a numerical value to each category, the property needs to
used as a module by Snort to read the types of applications undergo Label Encoding. Label Encoding involves the
running and caught in Snort's activity log. Logging transformation of labels into a numeric format, enabling them
configuration is done in the snort configuration file to be represented in a form that a machine can process [28].
“snort.lua”. It is necessary to integrate Snort with Filebeat so In this research and Table II which applies the LabelEncoder
that the attack data obtained through Snort can be visualized function into the categorical value function on training data
and integrated with the installed Elasticstack. Configuration and test data follow by OneHot Encoding in Table III by
to forward logs received from Snort to Elasticstack can be splitting the value of the parent feature becomes a new feature
done directly through the Filebeat configuration file to define and makes the value of the feature newly formed is an integer.
Snort's log locations. The last step is to create an index pattern TABLE II
in Kibana to visualize data entry from the Snort log. LABEL ENCODING

C. Data Processing No protocol_type service flag


0 tcp ftp_data SF
Before making predictions on the dataset, it is crucial to 1 udp other SF
perform data preprocessing. Data preprocessing plays a 2 tcp private S0
significant role in data analysis as it focuses on removing 3 tcp http SF
unwanted data disturbances and improving the desired data 4 tcp http SF
characteristics [26]. Pre-processing processes start with
importing the datasheet, followed by label modification for No protocol_type service flag
train data. The One-Hot Encoding technique is employed to 0 1 20 9
convert categorical attributes into binary properties. For 1 2 44 9
optimal execution of the One-Hot Encoding, the input 2 1 49 5
3 1 24 9
provided to the transformer should be an integer matrix that 4 1 24 9
represents the categorical (discrete) values. The resulting

TABLE III
ONE-HOT ENCODING

No protocol protocol protocol service service service service service service service
_type_icmp _type_tcp _type_udp _irc _x11 _x39_50 _aol _auth _bgp _courier
0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

Feature scaling or normalization is used to change the scale TABLE IV


of each feature so that it can be adjusted to the new features SPLITTING LABEL
that are created [29]. In this research, the feature scaling is No label
done by dividing the data frame into X and Y variables, where 0 normal
the X variable is a property. In contrast, the Y variable is the 1 normal
result. In Table IV, the researcher classifies the attack data 2 dos
contained in the dataset by type. After splitting the dataset, it 3 normal
is necessary to find out the correlation between the features in 4 normal
the dataset, as shown in Fig.2, to find out which feature … …
variables influence or do not affect the detection results when 125968 dos
125969 normal
conducting experiments on the dataset. The green color
125970 normal
indicates a high degree of correlation while the red indicates 125971 dos
the opposite; this can make it easier for researchers to make 125972 normal
machine learning models.

1233
Fig. 2 Heatmap result

Where represents slack variables that enable instances


D. Training Model Comparison to deviate from the margin, the process of finding the optimal
Comparison models must be tested to provide empirical soft margin involves the following steps:
min || || ! ∑#$
data so that Collaborative IDS can add machine learning with
suitable algorithms. Training models are made in several (3)
algorithms, SVM model train, K-means, and Neural Networks. The parameter % determines the extent to which a single
The SVM addresses a binary classification problem within a training example affects the model, with low values indicating
supervised learning setting and can handle many patterns. a far reach and high values indicating a closer reach [30]. In
Utilizing a hyperplane, the SVM separates instances the clustering with K-means train model, the flooding attaches
belonging to different classes and ensures that all instances lie attack has determined 4 clusters. Determine the centroid
outside the margin. This results in a strict margin, which can position for each cluster using Euclidean distance from train
be mathematically represented as follows: data to each centroid, and train data to its nearest centroid to
. 1 1 , , (1) form k clusters.
Let represent instances, ∈ {−1, 1} indicate the labels -∑#.
& ' ( )*)+ , /* 0 1 (4)
of these instances, denotes the intercept term, represents
the normal vector to the hyperplane, is the dimension of the Recalculate and repeat the average centroid of k clusters,
input vector, and is the number of input data points. In real- replacing the original centroid with the new average centroid.
world scenarios, locating the hyperplane may not be feasible After iterations, This repeat stop until the centroid value is
due to the presence of outliers – instances that significantly unchanged [31].
/* ' ∑ 534
deviate from others within the same class. To address this
issue, the concept of a soft margin was introduced and is 2 34
(5)
expressed as follows. In this research, a Neural Network is utilized, and the
. 1 , 0, 1 (2) sklearn. preprocessing package in Python is employed.
Specifically, a Multi-layer Perceptron (MLP) is used with a
hidden layer size of 20. For binary classification, the outputs

1234
of the MLP, represented as f(x), are passed through the ALGORITHM SVM_NSL
# Constructor
logistic function in formula 3.6. This transformation ensures FUNCTION __init__(self):
that the output values fall within the range of zero to one. CALL DefaultNSL.__init__(self)
Subsequently, a threshold of 0.7 is set, meaning that any
# Static method to load data from a file
output equal to or greater than 0.7 is assigned to the positive FUNCTION load_data(filepath):
class, while the remaining outputs are assigned to the negative # Preselected features using InfoGain Ranker
class. infogain_ind <- [3, 4, 5, 6, 10, 11, 12, 13, 16, 17, 18, 19, 20, 21, 22, 23,

6 7 ' 1⁄ 1 8 9:
24, 25, 26]
(6)
data <- Read CSV data from 'filepath' using 'COL_NAMES' and
E. Cross Validation 'USE_COL_NAMES'

Cross-validation is a statistical method used to evaluate and # Shuffle data


compare machine learning algorithms by dividing the data data <- Shuffle data randomly
into two segments. One segment is used to train or learn the labels <- Extract 'labels' from data
model, while the other is used to validate the model [32]. This data <- Extract features from data based on 'infogain_ind'
study uses cross-validation to assess machine learning
algorithms based on four metrics: accuracy, precision, recall, nom_ind <- [0]
FOR num in nom_ind:
and f-measure. Table V shows the confusion matrix and # Convert nominal features to category codes
formulation to accuracy in formula 7, precision in formula 8, data[:, num] <- Convert data[:, num] to category type and then to
recall in formula 9, and F-Score in formula 10. category codes

TABLE V # Scale all data to [0-1]


CONFUSION MATRIX scaler <- Create a MinMaxScaler object
data <- Scale data using the scaler and assign column names from data
Predicted
Positive Negative RETURN [data, labels]
True Positive False Negative
Positive
Actual

# Method to train the classifier


(TP) (FN) FUNCTION train_clf(self):
False Positive True Negative train_data, train_labels <- Extract 'training' data and labels from self
Negative
(FP) (TN) bin_labels <- Convert train_labels to binary labels using 'ATTACKS'
?@A?2
<//= >/ ' ?@AB@A?2AB2
mapping
(7) self.clf <- Create a LinearSVC classifier with C=8
?@
self.clf.fit(train_data, bin_labels)
C 8/ D ' ?@AB@ (8)
# Method to test the classifier
?@
8/>EE ' ?@AB2 (9) FUNCTION test_clf(self, train=False):
IF train is True:
@IJ3 K L#∙MJ3NOO
F1 D/ 8 ' 2 ∙ @IJ3 K L#AMJ3NOO
data, labels <- Extract 'training' data and labels from self
(10) ELSE:
data, labels <- Extract 'testing' data and labels from self
F. Snort Machine Learning Plug-in Integration test_preds <- Predict labels using self.clf on 'data'
Snort needs to detect and also ensure that the installed
machine learning will work optimally, so the feature extractor RETURN [test_preds, 10]

plugin is used [33] which is used to extract a subset of the # Method to predict using the classifier
NSL-KDD dataset based on real-time network traffic and FUNCTION predict(self, packet):
capture files from network traffic (PCAP). To integrate Snort # Preselected features using InfoGain Ranker
infogain_ind <- [3, 4, 5, 6, 10, 11, 12, 13, 16, 17, 18, 19, 20, 21, 22, 23,
and Machine Learning, Hyperscan and Flatbuffers modules 24, 25, 26]
installed on Snort to normalize and label the RAW Packet
Data Header and also the DPX plugin [34] can be used to data <- Create DataFrame 'data' from 'packet' using
'USE_COL_NAMES'
analyze the binary output resulting from feature extraction
reconstruction on real-time network traffic packets, which is # Shuffle data
used to predict received packets based on machine learning. data <- Shuffle data randomly

Fig.3 shows all the Snort plugins that have been installed. The labels <- Extract 'labels' from data
Snort plugin was developed using the best machine learning data <- Extract features from data based on 'infogain_ind'
algorithm, which was compared in the model testing phase. nom_ind <- [0]
The detailed algorithm used is described in the SVM_NSL FOR num in nom_ind:
algorithm. # Convert nominal features to category codes
data[:, num] <- Convert data[:, num] to category type and then to
category codes

# Scale all data to [0-1]


scaler <- Create a MinMaxScaler object
data <- Scale data using the scaler and assign column names from data

# Predict using the trained classifier


predict <- Predict labels using self.clf on 'data'

RETURN predict
Fig. 3 Snort plugin list

1235
III. RESULTS AND DISCUSSION C. Cross-Validation and False Positive Negative
A. Collaborative IDS Integration In Table VIII, the detection accuracy value reaches 99%,
the precision value reaches 99%, the recall value reaches 99%,
The results of the collaborative IDS system implementation and the F-Measure value reaches 99% for DoS attacks. At
are based on the logs running on the system service, which Probe attack the accuracy value reaches 98%, the precision
will show the background process of the multi-IDS system value reaches 96%, the recall value reaches 98%, and the F-
working optimally, and there are no constraints on the running Measure value reaches 97%. SVM is able to get an accuracy
background processes. Filebeat, which is installed on the value of 96%, a precision value of 94%, a recall value of up
Collaborative-IDS system, is capable of monitoring incoming to 96%, and an F-Measure value of up to 95% in the R2L
logs. Researchers use the Logstash as a link between the attack type. Whereas the U2R attack SVM gets an accuracy
system input logs and the beat file itself. The process of value of 99%, a precision value of 91%, a recall value of 82%,
sending attack data logs obtained from snort needs to use and an F-Measure value of 84%.
Logstash to normalize data against attack logs to match the
format of logs originating from snort so that they can become TABLE VIII
CROSS-VALIDATION RESULT
input logs in Filebeat, which are then visualized in Kibana.
The visualization system needs to use a database to store Attack Accuracy Precision Recall F-Measure
attack data logs, researchers use Elasticsearch as a database DoS 99 % 99 % 99 % 99 %
Probe 98 % 96 % 98 % 97 %
management system used to index attack data. Meanwhile,
our Snort monitoring page was created by configuring an
index pattern for Snort. It was combined with Kibana's Testing static attack data is testing attack data based on
monitoring page to make it easy to visualize attack data previously normalized test data. Attack data testing is done by
objects because Snort monitoring does not need to be complex testing each type of attack. For the results of testing the type
like the monitoring carried out on Wazuh. of DoS attack, 6101 attacks were true positive, 9455 attacks
were true negative, 256 attacks were false positive, and 1359
B. Model Comparison were false negative as shown in Table IX. While the results of
Based on the conducted experiments, the results of attack testing the type of attack Probe attack resulted in 1136 attacks
detection accuracy are presented in Table VI. The findings with true positive values, 9576 attacks with true negative
indicate that the SVM algorithm exhibits the highest accuracy values, 135 attacks with false positive values, and 1285
compared to the other algorithms. SVM algorithm achieves attacks with false negative values as shown in Table X.
an accuracy of 97.7% on the training data and 97.5% on the TABLE IX
test data. In contrast, the K-Means algorithm demonstrates an TESTING RESULT DOS ATTACKS
accuracy of 88.3% on the training data and 76.1% on the test Predicted Attack 0 1
data. Furthermore, the Neural Network algorithm achieves a Actual Attacks
0 9455 256
training data accuracy of 96.7% and a test data accuracy of 1 1359 6101
78.4%.
TABLE X
TABLE VI TESTING RESULT PROBE ATTACK
DATASET ACCURACY COMPARISON
Predicted Attack 0 1
Algorithm Model Training Test Actual Attacks
Support Vector Machine 97.7% 97.5% 0 9576 135
K-Means 88.3% 76.1% 1 1285 1136
Neural Network 96.7% 78.4%
D. Real-time Attack Prediction
Table VII illustrates these findings when employing a For the machine learning monitoring page, we use a pre-
Neural Network algorithm, which leverages the relationship configured Flask server to use a plugin connected to snort to
between features. The results indicate an accuracy of 77.7% predict attack packages in real-time. This page divides the
for DoS attacks and 86.3% for Probe attacks. Furthermore, received packets into four columns: normal attack packets,
when utilizing the cluster-based K-Means algorithm, there is DoS attack packets, Probe attack packets, and total attack
an improvement in accuracy, achieving 80.4% for DoS packets. The test scenario involves the real-time transmission
attacks and 99.9% for Probe attacks. However, the Support of DoS attacks and Probe attacks to multi-IDS hosts. This test
Vector Machine (SVM) algorithm demonstrates the most aims to compare the results obtained from the multi-IDS logs
consistent detection performance. The SVM algorithm with the data collected by the Flask Web Application in Fig.4.
achieves an accuracy of 99.3% for DoS attacks and 98.4% for The Flask Web Application plays a crucial role in ensuring
Probe attacks. that the machine learning-based attack detection on the multi-
IDS systems enhances the real-time accuracy of attack
TABLE VII
ACCURACY BASED ON ATTACK TYPE detection. Testing DoS Attacks involves using ICMP-type
packets sent by the attacker host to the multi-IDS hosts,
Attack Type
Algorithm model
DoS Attack Probe Attack
specifically utilizing ICMP flooding to send DoS packets.
Neural Network 77.7% 86.3% Table XI provides insights into the incoming attack data
K-Means 80.4% 99.9% against the multi-IDS logs, which record an attack data count
Support Vector Machine 99.3% 98.4% of 11,065, matching the machine learning logs. This confirms
that the machine learning plugin strengthens and validates the

1236
accuracy of the attack logs received by the multi-IDS system, collaborative IDS and machine learning systems successfully
resulting in the absence of false positives or false negatives in detect attacks such as DoS, Probe, U2R, and R2L statically
the multi-IDS system. and capable of detecting DoS attacks and Probe attacks
dynamically through simulated attacks that are carried out in
real-time. NIDS Snort detected a total of 12.997 attack events
consisting of 11.065 DoS attack data and 1.932 Probe attacks,
and HIDS Wazuh detected a total of 563 events, some of them
were 12 attempts to log in to valid accounts: one attempted
rootkit type attack, and five attacks using superuser privileges.
In real-time scenarios, the SVM machine learning
algorithm, in the form of a plugin, successfully identifies all
11,065 DoS attacks without false positives and false negatives.
This machine learning also successfully detailed
classification, which contributes to the technician providing
Fig. 4 Machine Learning Plugin GUI the proper response to register the hacker scanning the
network with a probe attack detection test. While the Snort
TABLE XI
REAL-TIME DOS ATTACK PREDICTION ruleset detected 1,925 Probe attacks, machine learning
identified 1,195 DoS attacks based on SYN Flooding attacks
Machine Learning IDS Log and 730 Probe attacks, showcasing enhanced detection
Packet Flow
Plugin Detection
capabilities.
Incoming 11.065 11.065
Packets
In future research, this machine learning plugin can be
Total Packets 11.065 11.065 integrated into the ELK stack to simplify monitoring and
realize centralized logs. implementation of containerization
DoS attacks that use TCP packets are known as TCP SYN can also be done to expand the impact of the research
Flooding DoS attacks [36] This particular attack leads to an conducted.
overwhelming influx of TCP protocol requests to the systems,
resulting in decreased performance. If an excessive number of ACKNOWLEDGMENT
TCP SYN Flooding packets are sent, it can potentially cause Thanks to National Cyber and Crypto Polytechnic for
a denial of service and severely impact the functionality. To supporting this submission.
illustrate the attack, a probe attack test was conducted, and
TCP and UDP packets were employed to conduct IP sweep REFERENCES
and port sweeps. These techniques are frequently employed [1] K. A. Scarfone and P. M. Mell, “Guide to Intrusion Detection and
to scan IP addresses and identify open ports on the targeted Prevention Systems (IDPS),” National Institute of Standards and
system [35]. Nmap was utilized to conduct network sweeps Technology, 2007. doi: 10.6028/nist.sp.800-94.
[2] M. Tavallaee, E. Bagheri, W. Lu, A. A. Ghorbani, and I. C. I. Society.,
by automating the delivery of SYN Flooding packets. The
“A Detailed Analysis of the KDD CUP 99 Data Set,” in IEEE
researchers employed the Nmap command, commonly used Symposium on Computational Intelligence, 2009.
for IP and port sweeps in network scanning. [3] S. Kumar, S. Gupta, and S. Arora, “Research Trends in Network-
A total of 1,932 packets were sent to the multi-IDS system Based Intrusion Detection Systems: A Review,” IEEE Access, vol. 9,
pp. 157761–157779, 2021, doi: 10.1109/access.2021.3129775.
to evaluate the classification of the sent packets as probe
[4] S. A. R. Shah and B. Issac, “Performance comparison of intrusion
attacks. The attack log indicated that the multi-IDS system detection systems and application of machine learning to Snort system,”
received 1,932 incoming packets associated with the probe Future Generation Computer Systems, vol. 80, pp. 157–170, Mar.
attack. There are differences in machine learning 2018, doi: 10.1016/j.future.2017.10.016.
[5] F. A. Vadhil, M. F. Nanne, and M. L. SALIHI, “Importance of
classification data for the types of probe attacks, where the
Machine Learning Techniques to Improve the Open Source Intrusion
machine learning classification results show 1.838 DoS Detection Systems,” Indonesian Journal of Electrical Engineering and
packets, and 94 probe packets were detected, as shown in Informatics (IJEEI), vol. 9, no. 3, Sep. 2021,
Table XII. These results prove that the machine learning doi:10.52549/ijeei.v9i3.3219.
[6] A. Ghasempour, “HTTP based Network Intrusion Detection System
plugin created is capable of more comprehensive detection
by Using Machine Learning-Based Classifier,” Tallinn, 2021.
than the attack log based on predefined rulesets from snort. [7] B. Subba, S. Biswas, and S. Karmakar, “Host Based Intrusion
TABLE XII Detection System Using Frequency Analysis of N-Gram Terms,” in
REAL-TIME PROBE ATTACK PREDICTION IEEE Region 10 Conference (TENCON) Malaysia, 2017.
[8] I. Wahidah, Y. Purwanto, and A. Kurniawan, “Collaborative intrusion
Packet Flow Machine Learning Plugin Multi-IDS detection networks with multi-hop clustering for internet of things,”
Log International Journal of Electrical and Computer Engineering (IJECE),
Incoming 1.838 incoming DoS packets 1.932 Probe vol. 11, no. 4, p. 3255, Aug. 2021, doi: 10.11591/ijece.v11i4.pp3255-
Packets and 94 Probe packets packets 3266.
[9] K. Farhana, M. Rahman, and Md. T. Ahmed, “An intrusion detection
Total Packets 1.932 packets detected 1.932 packets
system for packet and flow based networks using deep neural network
detected approach,” International Journal of Electrical and Computer
Engineering (IJECE), vol. 10, no. 5, p. 5514, Oct. 2020,
IV. CONCLUSION doi:10.11591/ijece.v10i5.pp5514-5525.
[10] T. A. Jasim Ali and M. M. Taher Jawhar, “Detecting network attacks
The results of the implementation of Elasticsearch, model based on a convolutional neural network,” International Journal
Logstash, Filebeat, Kibana, Snort, and Wazuh to make of Electrical and Computer Engineering (IJECE), vol. 13, no. 3, p.
3072, Jun. 2023, doi: 10.11591/ijece.v13i3.pp3072-3078.

1237
[11] N. Yoshimura, H. Kuzuno, and Y. Shiraishi, “DOC-IDS : A Deep [24] M. Tiwari, R. Kumar, A. Bharti, and J. Kishan, “Intrusion Detection
Learning-Based Method for Feature,” 2022. System,” in Article in International Journal of Technical Research and
[12] S. A. Albelwi, “An Intrusion Detection System for Identifying Applications, 2017, vol. 5, no. 2, pp. 38–44. [Online]. Available:
Simultaneous Attacks using Multi-Task Learning and Deep Learning,” www.ijtra.com,
2022 2nd International Conference on Computing and Information [25] A. Singh, N. Thakur, and A. Sharma, “A review of supervised machine
Technology (ICCIT), Jan. 2022, doi:10.1109/iccit52419.2022.9711630. learning algorithms,” in 2016 3rd International Conference on
[13] “A Hybrid IDS Using GA-Based Feature Selection Method and Computing for Sustainable Global Development (INDIACom), 2016,
Random Forest,” International Journal of Machine Learning and pp. 1310–1315.
Computing, vol. 12, no. 2, Mar. 2022, [26] T. Al-Shehari and R. A. Alsowail, “An Insider Data Leakage Detection
doi:10.18178/ijmlc.2022.12.2.1077. Using One-Hot Encoding, Synthetic Minority Oversampling and
[14] Z. U. A. Tariq, E. Baccour, A. Erbad, M. Guizani, and M. Hamdi, Machine Learning Techniques,” Entropy, vol. 23, no. 10, p. 1258, Sep.
“Network Intrusion Detection for Smart Infrastructure using Multi- 2021, doi: 10.3390/e23101258.
armed Bandit based Reinforcement Learning in Adversarial [27] A. Gupta, “One Hot EnCoding | Data Science and Machine Learning |
Environment,” 2022 International Conference on Cyber Warfare and Kaggle.” https://www.kaggle.com/discussions/getting-started/114797
Security (ICCWS), Dec. 2022, (accessed Nov. 11, 2022).
doi:10.1109/iccws56285.2022.9998440. [28] “ML | Label Encoding of datasets in Python - GeeksforGeeks.”
[15] F. A. Saputra, M. Salman, K. Ramli, A. Abdillah, and I. Syarif, “Big https://www.geeksforgeeks.org/ml-label-encoding-of-datasets-in-
data analysis architecture for multi IDS sensors using memory based python/ (accessed Oct. 15, 2022).
processor,” 2017 International Electronics Symposium on Knowledge [29] A. Zheng and A. Casari, Feature Engineering for Machine Learning:
Creation and Intelligent Computing (IES-KCIC), vol. 4, pp. 40–45, Principles and Techniques for Data Scientists, 1st ed. California:
Sep. 2017, doi: 10.1109/kcic.2017.8228456. O’Reilly Media, Inc., 2018.
[16] B. Kerim, “Securing IoT Network against DDoS Attacks using Multi- [30] I. Stromberger, N. Bacanin, and M. Tuba, “Hybridized krill herd
agent IDS,” Journal of Physics: Conference Series, vol. 1898, no. 1, p. algorithm for large-scale optimization problems,” 2017 IEEE 15th
012033, Jun. 2021, doi: 10.1088/1742-6596/1898/1/012033. International Symposium on Applied Machine Intelligence and
[17] Wazuh, “Wazuh - Components · Wazuh documentation.” Informatics (SAMI), Jan. 2017, doi: 10.1109/sami.2017.7880356.
https://documentation.wazuh.com/current/getting- [31] G. Zhang and E. Li, “Research on IDS Snort Based on Classic
started/components/index.html Clustering Algorithm,” 2020 International Conference on Urban
[18] Wazuh, “Wazuh agent - Components · Wazuh documentation.” Engineering and Management Science (ICUEMS), Apr. 2020,
https://documentation.wazuh.com/current/getting- doi:10.1109/icuems50872.2020.00147.
started/components/wazuh-agent.html (accessed Nov. 11, 2021). [32] P. Refaeilzadeh, L. Tang, and H. Liu, “Cross-Validation,”
[19] Wazuh, “Wazuh server - Components · Wazuh documentation.” Encyclopedia of Database Systems, pp. 532–538, 2009,
https://documentation.wazuh.com/current/getting- doi:10.1007/978-0-387-39940-9_565.
started/components/wazuh-server.html (accessed Nov. 11, 2021). [33] Artificially Intelligent Intrusion Detection System,
[20] Wazuh, “Wazuh Elastic Stack.” “kdd99_feature_extraction,” Github, 2022.
https://documentation.wazuh.com/current/getting- https://github.com/AIIDS/kdd99_feature_extractor (accessed Mar. 20,
started/components/elastic_stack.html (accessed Nov. 11, 2021). 2022).
[21] M. G., S. Prabu, and L. B. C., “Detecting DDoS Attack,” Applications [34] Snort, “DPX Readme.” https://snort.org/documents/dpx-readme
of Artificial Intelligence for Smart Technology, pp. 55–66, 2021, doi: (accessed Mar. 20, 2022).
10.4018/978-1-7998-3335-2.ch004. [35] K. Labib and V. Rao Vemuri, “Detecting Denial-of-Service And
[22] R. U. Rehman, Intrusion Detection Systems with Snort Advanced IDS Network Probe Attacks Using Principal Component Analysis,” pp. 1–
Techniques Using Snort, Apache, MySQL, PHP, and ACID. New 8, 2011.
Jersey: Pearson Education Inc., 2003. [Online]. Available: [36] C. L. Schuba, I. V. Krsul, M. G. Kuhn, E. H. Spafford, A. Sundaram,
http://www.phptr.com and D. Zamboni, “Analysis of a denial of service attack on TCP,”
[23] J. Frank, “Artiicial Intelligence and Intrusion Detection: Current and Proceedings. 1997 IEEE Symposium on Security and Privacy (Cat.
Future Directions,” 1994. No.97CB36097), 1997, doi: 10.1109/secpri.1997.601338.

1238

You might also like