Application of Machine Learning in The Optimization of Firewall Parameters Based On Attack Detection
Application of Machine Learning in The Optimization of Firewall Parameters Based On Attack Detection
net/publication/387948811
CITATIONS READS
0 118
4 authors, including:
Vladica Stojanović
University of Criminal Investigation and Police Studies
80 PUBLICATIONS 284 CITATIONS
SEE PROFILE
All content following this page was uploaded by Vladica Stojanović on 12 January 2025.
Abstract: This paper explores the optimization of While edge firewalls focus on controlling
firewall parameters for attack detection using external connections, data centre firewalls are
machine learning techniques, focusing on specifically designed to protect internal resources
improving network security in dynamic within the data centre environment.
environments. Traditional firewall systems often
face limitations in detecting malicious traffic due to
From an implementation perspective, firewalls
static rule sets and high false positive rates, can be categorized as software-based, hardware-
particularly in real-world scenarios with evolving based, or cloud-based solutions. Each type offers
attack patterns. To overcome these challenges, this the same level of protection but comes with
study applies neural networks with the rectified specific limitations and differences. Beyond their
linear unit activation function (ReLU), which implementation, it is crucial to consider the types
enables precise attack detection and real-time of firewall device in relation to their intended
firewall policy adjustments. The proposed 5-5-4 functionality. The primary categories include
neural network model, tested using real-world packet filtering firewalls, stateful inspection
datasets, achieved an accuracy of 96.3%,
outperforming alternative configurations. The
firewalls, and proxy firewalls. In addition,
analysis evaluated three scenarios: normal specialized solutions such as application firewalls,
conditions, active attacks, and post-policy web application firewalls, and virtual infrastructure
adjustment, confirming the effectiveness in firewalls are available.
enhancing detection and mitigation capabilities. A firewall’s primary function is to filter network
The results highlight the potential of machine traffic based on predefined policies. These policies
learning, particularly neural networks, as a robust are shaped by organizational requirements,
tool to improve network security. This approach network architecture, and configuration needs,
enables future integration of IoT and real-time threat while also taking into account real-time network
monitoring.
conditions. Policies are defined using attributes
Index Terms: DDoS attack, firewall, machine
such as source and destination ports, network
learning, neural networks, Orange address translation (NAT), bytes sent and
received, packets sent and received, and specific
actions to be taken [1].
1. INTRODUCTION This paper aims to demonstrate the optimization
HE growing importance of security in of an Internet firewall using machine learning (ML).
T computer networks highlights the need for For this purpose, a dataset collected from an
robust monitoring of end devices, network Internet firewall was analysed using the Orange
infrastructure, and firewall systems. software package [2]. Orange was selected for its
Firewalls play a vital role in this landscape extensive graphical components and algorithms,
by controlling network traffic and regulating packet making it a powerful tool for data analysis and
transmission between zones (network areas), with modelling.
the ability to inspect packets when necessary. Machine learning, along with neural networks, is
Firewalls are generally classified into two main increasingly recognized as providing exceptional
types: edge firewalls, which manage Internet results in this field. Neural networks, inspired by
access, and data centre firewalls, which secure biological nerve cells, are designed to meet the
access to server infrastructures. computational needs of systems that use this
technology. These networks can be classified into
The manuscript was received on October 1st, 2024. cellular, layered, and fully connected structures
Dragan JEVTIĆ, Infrastructure of Serian Railway, [3].
Nemanjina6, Belgrade, Serbia, [email protected], Developing neural networks involves several
Information Technology Studies, University of Criminal key stages. The process begins with the dataset,
Investigation and Police Studies, Cara Dušana 196, Belgrade,
followed by the design of the system, which
Serbia
Petar ČISAR, Information Technology Studies, University of includes tasks such as dataset preprocessing,
Criminal Investigation and Police Studies, Cara Dušana 196, defining the network topology, setting parameters,
Belgrade, Serbia, [email protected] and selecting activation functions. This stage
encompasses loading and filtering the dataset,
86
specifying the type of network along with its
topology, link type, link order, and weight range. It
also involves defining the characteristics of
individual nodes and determining the system's
dynamics, including the initial weight scheme,
activation equations, calculations, and the
learning algorithm. After the design stage, the
system must be trained and finally tested. The test
dataset should be left aside. If we use a
multilayered perception of a neural network,
certain errors that occur during classification can Figure 2: Artificial neuron [5].
be reintroduced back into the network to modify Supervised learning defines the outputs with a
the network parameters. specific mapping of the input (x) and output (y).
Machine learning algorithms can be divided into Model creation begins by feeding a large amount
three categories: supervised learning, of data to the input, which starts the model's
unsupervised learning, and reinforcement training. A large amount of input data allows the
learning. Supervised learning, the most essential model to be more accurate. The high accuracy of
form of machine learning, is characterized by data the model will enable us to have adequate
consisting of pairs of descriptions of what is being prediction of the output values by providing test
learned and what needs to be learned. On the data to the model.
other hand, unsupervised learning is defined by Each neuron has its input, which is connected
the absence of predefined labels or outcomes, through synapses. The inputs, shown as xi where
focusing instead on identifying patterns or i=1...n, can be new unprocessed input data or data
structures within the data. However, reinforcement sent from another neuron. The strength of the
learning is employed when a sequence of actions connection between neurons and individual input
is required to find a solution to a problem. The data is not arbitrary but is precisely defined by
most widely used model is the supervised learning synaptic weights. These weights play a crucial role
model where the input variable is derived from an in the transfer function, where the input multiplied
input dataset, and the focus is on identifying by its synaptic weight enters. The result of the
patterns within the data. Figure 1 illustrates the transfer function is then compared to a threshold
main classification of machine learning algorithms. value, triggering an activation function that gives a
result of 1 or 0, the final output of the neuron [6].
The inputs received by neurons, are followed by
activation function. Activation functions introduce
non-linearity to the neural network, allowing it to
model more complex classifications. These
functions vary depending on the mathematical
formulas used, determining how the input data is
processed. The choice of activation function is
crucial in the development of neural networks, as
it directly affects the learning speed, as well as the
accuracy and performance of the network [5].
Figure 3 provides examples of activation
functions, including the sigmoid, rectified linear
Figure 1: Classification of machine learning algorithms [1]. unit activation function (ReLU), and the hyperbolic
Neural networks are a branch of machine tangent (tanh).
learning that is key to deep learning based on The sigmoid function is not zero-centred and
artificial intelligence. Neural networks (NN) consist has an exponent calculation. The corresponding
of mutually densely connected processing formula for the function is found in [8]:
elements (neurons) organized according to
1 (1)
specific architectures. NN shows information Sigmoid(𝑥) =
processing with learning and generalization 1+𝑒
characteristics based on the training dataset. It is
applied mainly to problems that require a clearly Another problem with this function arises during
defined function that connects the input and output the training of neural networks. The gradients,
data [4]. which indicate how much a parameter should be
This paper deals with neural networks with adjusted to reduce the error, gradually decrease
supervised learning. The general principle of the and eventually approach zero. This factor makes
operation of neural networks is shown in Figure 2. the learning model difficult. Saturation of the
gradients is done by normalizing the data and
weights and using activation function that do not
lead to rapid saturation, such as ReLU. The
87
corresponding formula for the ReLU function is [9]: slope constant, which prevents ‘dead’ ReLU, and
accelerates learning by improving balance. The
1 𝑖𝑓 𝑥 > 0 (2) modified function, known as Leaky ReLU, is
ReLU(𝑥) =
0 𝑖𝑓 𝑥 < 0 shown in Figure 4a.
The Maxout function uses the maximum value
The hyperbolic tangent is an odd, monotonically within a group of linear parts. Unlike ReLU, which
increasing function. The function is centred at zero compares the value to zero, Maxout compares it
and is between -1 and 1. The corresponding to the highest value within the candidate group.
formula for the function is: The exponential linear unit (ELU) is another
variation of the ReLU function, offering improved
sinh (𝑥) 𝑒 − 𝑒 (3)
Tanh(𝑥) = = performance for values of x <0 (Figure 4b). It
cosh (𝑥) 𝑒 + 𝑒 shares similar properties with ReLU, but avoids
the problem of dead ReLUs [3]. The
The hyperbolic tangent function also has a
corresponding formula for the ELU function is [9]:
problem with gradient saturation. When the input
is greater than zero, the gradients will be positive
𝑥 𝑖𝑓 𝑥 > 0 (4)
or negative, causing them to disappear. ELU(𝑥) =
𝑎(𝑒 − 1) 𝑖𝑓 𝑥 < 0
a)
a)
b)
b)
Figure 4: Examples of activation function a) Leaky ReLU
and b) ELU [10].
88
Sigmoid Tanh ReLU ELU forests, naive Bayes, and support vector
machines, were applied for the analysis. Among
Lack of
adaptability
yes yes yes yes these, the random forest algorithm demonstrated
the highest precision, achieving 99.6%, compared
Computational to the Decision Tree (DT) algorithm, which had a
yes yes no partial
inefficiency
precision of 99.3%.
When comparing Maslan’s study, which used
Neural network learning is carried out using real data, with Najafimehr's work with open
algorithms known as backpropagation, which datasets, it can be observed that real-world data
calculates the gradients necessary for adjusting tend to produce lower precision. This is due to the
the weight values in the network. Some of the fact that, under real conditions, various factors
most used algorithms include Adam, Adagrad, come into play, including the type of network, the
Nadam, Vanilla, and stochastic gradient descent nature of the end node, and the specific type of
among others. In this paper, the authors will focus attack.
on the Adam algorithm (Adaptive Moment In his work, Stephan [13] tried to use neural
Estimation), which, not only calculates its learning networks to set up a system for detecting attacks
rate, but also incorporates momentum into the on a web server. The neural networks that were
update by acting on the first-order gradient [6]. tested were defined with one hidden layer. When
testing with 5 nodes in the hidden layer, the results
2. RELATED WORKS obtained were 92.17% successful. Any deviation
from the value of 5 nodes in the hidden layer
The analysis of datasets, particularly general dramatically degrades the pattern detection
free datasets, and the use of appropriate performance.
algorithms have mainly been the focus of Tivari [14] works with the other dataset, NSL-
researchers. Along with various software
KDD [15]. Dataset NSL-KDD is used for network
packages, Attack detection systems, are
security testing and functions analysis. Using
commonly used to identify transmission
classical machine learning models, Tivari came to
anomalies, as discussed in numerous studies [11-
23]. the following results. Comparing these studies for
In his work, Maslan [11] focused on the concept precision, it could be concluded that the highest
of detecting Distributed Denial-of-Service (DDoS) accuracy could be obtained by using artificial
attacks using machine learning techniques, neural networks with a value of 99.4%. The next
applied to a real WEB server. According to the group of models has a similar accuracy of 95%
study, the reason for investigating this type of and include the SVM, Passive Aggressive
attack is that DDoS attacks account for 79% of all Classifier, and Ridge Classifier, random forest with
attacks in Malaysia. The techniques used in the 94% and naive Bayes with 89%. It is also worth
study include naive Bayes (NB), random forest mentioning the Decision Tree with 79% accuracy.
(RF), neural network (NN), support vector As mentioned so far, high values come with public
machine (SVM), and k-nearest neighbour (k-NN). data sets. Using non-public datasets, values are
The input dataset consists of typical attributes different.
such as the source and destination addresses,
packet size, packet type, and total number of Mahmood [16] took the example of private cloud
packets. The analysis revealed that the highest network attacks as another example of the use of
accuracy was achieved using random forest and machine learning algorithms. Threats generated
NN algorithms, reaching 98.70% using both. In against cloud services cannot be classified into a
comparison, other algorithms performed slightly separate group, but are an identical type of threat
worse, with naive Bayes achieving 97.96%, SVM against the end user. In this example, the following
98.41%, and k-NN 97.63%. The NN used in the algorithms were used: kNN, decision tree, SVM,
study consisted of two hidden layers in the 4-4 Random Forest, SGD, NN, Naive Bayes, logistic
model configuration. regression, gradient boosting, and AdaBoost. The
Studies such as Maslan [11] and Najafimehr traffic analysis was carried out continuously for 30
[12] have focused mainly on traditional machine days. Based on a month-long analysis, results
learning algorithms. For example, Najafimehr were obtained that confirm that the neural network
used publicly available datasets, including algorithm does not lag behind other machine
CICIDS2017 (DDoS subset) for training and learning algorithms. Regarding percentages, the
CICDDoS2019 for testing. Two open-source following accuracy results were obtained: neural
datasets were used in his work: the CICIDS2017 network 87.6%, random forest 86.9%, logistic
dataset (DDoS subset) for training and regression 87.7%, decision tree 87.7%, SVM
CICDDoS2019 for testing. Both datasets 87.6%, etc. The data obtained confirm that the
contained 42 attributes each. Machine learning results are below 99% for real datasets compared
algorithms, including decision trees, random to open datasets.
Lillmond [17], who analysed the deep neural
89
network, obtained better results in analysis. A laboratory-based analyses. The Check Point
deep neural network refers to networks with firewall served as the central firewall system for
multiple hidden layers. The model used in this the study. Firewalls are critical to preventing
analysis had an action output attribute with four threats and offer protection against advanced
possible values: allow, deny, drop, and reset. The attacks. The Check Point firewall provides several
allow value was 57.4%, drop 19.6%, and deny key functions, including deep learning capabilities,
22.9%. Analysis revealed that the deep neural threat prevention (such as blocking zero-day DNS
network achieved 94.49% precision for the test and phishing attacks), and protection of Internet of
model and 95.81% accuracy for the training Things (IoT) devices. Furthermore, the firewall
model. supports 2.5Gbps threat prevention throughput,
Thi-Thu-Houng [18] conducted a test attack which improves its overall performance in real-
using several models that included a small attack world applications [24].
within the dataset. The attack models were denial The robot system (BOT) used in this study was
of service (DoS), user-to-root (U2R), and remote- based on open-source software from the
to-local (R2L). In this experiment, Thi-Thu-Houng github.com portal [25]. Wireshark software [26]
used an open-source dataset, the KDD Cup [19]. was used to detect transmission anomalies. For
The model had 80 hidden layers and 500 epochs. the machine learning analysis, the Orange
The best results came from the Leaky ReLU software package was utilized []. Various neural
function, which achieved an accuracy of 0.97, network algorithms were applied, including
activation functions: identity, logistic, tanh, and
while the ReLU function yielded the lowest result
ReLU. The datasets used for the analysis were
with an accuracy of 0.94.
gathered directly from the firewall system under
Valentin [20] used 20,000 instances to evaluate
real working conditions.
a dataset with actions in three states: allow, deny,
and reject. The training datasets were divided into
negative and positive examples in an 80:20 ratio. 3.1. Network Model for Attack Analysis and
The best performance in the neural network was Detection
achieved with 13 hidden neurons. A network, as shown in Figure 5, was used to
Habibur [21] focused on analysing firewall traffic demonstrate how machine learning affects the
using real data logs from a firewall, consisting of operation of the firewall system and how it
approximately 67,000 logs. The paper discussed improves performance in terms of better detection
a method involving an activation function and two and blocking capabilities.
hidden layers with Models 3-4. The final prediction
was 0.75, with better results obtained from the
random forest and SVM models. The study did not
specify whether two separate datasets were used,
so it can be assumed that only one dataset was
analysed.
Abien [22] worked with the standard MNIST
dataset [23], which included 60,000 training
examples and 10,000 test cases. The paper
implemented two different classification functions: Figure 5: Layout of the analysis grid.
softmax and ReLU. To evaluate the performance
of the ReLU model, several metrics were used, Figure 5 illustrates an attack on the internal
including accuracy, standard deviation, recall, zone by two actors. One machine (Attacker2)
precision, F1 score, and confusion matrix. Both executes the attack directly, while the other control
(CCC) the attack using BOT functions, engaging
functions yielded similar results for these metrics,
other users (BOTs) on the Internet to participate in
ranging from 0.86 to 0.89, suggesting that Abien’s
the attack. The entire attack is executed using two
future work may involve exploring deep-learning
different methods, ping flood and http flood. A BOT
variations of the ReLU model. function is a software program that automatically
These studies demonstrate that neural network performs repetitive and targeted tasks. BOT can
algorithms can perform just as well as traditional be used for some business tasks but also for
machine learning algorithms. The addition of malicious purposes and could have a severe
multiple layers allows for improved precision, a impact on the local network. There are several
topic that will also be explored in this paper. types of BOT attacks [27], which are as follows.
Credential stuffing is when attackers use stolen
login credentials to gain access to another
3. MATERIALS AND METHODS
website. Bots circumvent existing built-in security
In this paper, working equipment was used for features in web application login forms by
the analysis to ensure that the results are in line attempting multiple simultaneous logins from
with practical scenarios, which may differ from various device types and IP addresses.
90
Web/content scraping is when bots download ATTACK_TARGET_HOST = "192.168.0.105"
content from a website to use it in future attacks. ATTACK_TARGET_PORT = "3000"
A website scraper bot sends a series of HTTP
GET requests, copies, and saves the information, # Type of Attacks
#HTTPFLOOD - Floods the target system with
all in seconds.
GET requests.
DoS and DDoS attacks are carried out with #PINGFLOOD - Floods the target system with
networks of Internet-connected machines, such as ICMP echo requests.
computers or IoT devices. Once the network is
infected, attackers send remote instructions to ATTACK_TYPE = "PINGFLOOD"
each bot to overload the server or network,
causing outages and downtime. #Status codes that must be set from the list below.
Brute force password cracking is an attack that # HALT - Stop attacks immediately.
uses bots to attack and infiltrate protected # LAUNCH - To immediately start the attack.
# HOLD - Wait for the command.
accounts by trying every possible password
# UPDATE - Update Client.
combination or cracking encryption key to gain ATTACK_CODE = "HALT"
unauthorized access to sensitive data.
Click fraud occurs when attackers target pay- In this paper, three datasets are defined for
per-click ads to boost the search rankings of a analysis. The first dataset represents the normal
webpage through fake clicks. operation of the firewall system, capturing its
This paper explores the use of bots to perform status during periods without active attacks on
DDoS attacks on a local network server. The specific groups or ports. This dataset serves as a
Python-based bot application includes server- and training set. The second dataset focuses on
client-side code. Server-client communication detecting transmission anomalies during an active
(CCC and bots) typically occurs through port 5555, attack, making it the test set. Comparing the
with options for alternative ports. Clients can changes between these two datasets provides
launch two types of attack: http flood and ping information on the firewall policies. The third
flood, both implemented in the client-side code dataset captures the data generated after the
(see Listing 1). The server commands, outlined in adjusted firewall policies are implemented,
Listing 2, determine which attack function is used. allowing the accuracy of anomaly detection and
The attack source comprises the selected function the effectiveness of the updated policies to be
and the target IP address. evaluated.
The dataset used for the analysis includes the
Listing 1: Client-side code that implements http flood and following attributes:
ping flood attacks [23] o destination: specifies the destination
IP address;
def run(self, n):
o interface direction: indicates incoming
run = 0
#terminate = 0 or outgoing traffic;
if n[3]=="HTTPFLOOD": o type: identifies whether the session is a
while self._running and attackSet: connection, log, or connection alert;
url_attack = 'http://'+n[0]+':'+n[1]+'/' o source: displays the source IP address;
u= o product: categorizes the session as
urllib.request.urlopen(url_attack).read() either a threat or access;
time.sleep(int(n[4])) o blade: specifies the firewall component
involved, such as antivirus, firewall, or
if n[3]=="PINGFLOOD":
VPN;
while self._running:
if attackSet: o source port;
if run == 0: o destination port;
url_attack = 'ping '+n[0]+' -i o protocol;
0.0000001 -s 65000 > /dev/null 2>&1' o action: determines whether the session
pro = is accepted, dropped, detected, or
subprocess.Popen(url_attack, prevented.
stdout=subprocess.PIPE, shell=True, The action attribute can take on several values,
preexec_fn=os.setsid) including [1]:
run = 1 else:
if run == 1: o allow: allows communication between
os.killpg(os.getpgid(pro.pid), the source and destination addresses.
signal.SIGTERM) o detect: monitors specific traffic that
run = 0 bypasses initial detection.
break o deny: blocks traffic between the source
and destination due to policy
restrictions and sends information to a
Listing 2: Server commands controlling client attack
functions [23] sender.
91
o drop: blocks traffic between the source
and destination without notifying the 50000 45519
sender. This is often preferred for
blocking potentially malicious traffic. 40000
o prevent: stops unauthorised or
30000
malicious traffic targeting the
destination address. 20000
Figure 6 presents the network model
9208
implemented in the Orange software package. 10000
This model is designed for the application of 19
machine learning algorithms, which aligns with the 0
focus of this study. The model utilises three Accept Drop Prevent
datasets: no-attack, attack, and defence. The no-
Figure 7: Numerical ratio of output states in the no-attack
attack dataset represents the network state when dataset.
no attack is occurring and is used to train the
model. This dataset contains approximately
54,500 log instances. Figure 7 illustrates the 60000 55808
numerical ratio of all three output states. The 50000
attack dataset represents the state of the network
during an ongoing attack and is used to evaluate 40000
the model. This dataset contains approximately 30000
64,100 log instances. Figure 8 illustrates the 20000
numerical distribution of all three output states. 8348
The defence dataset represents the network state 10000
13
after the firewall policy has been corrected. It 0
serves as an additional analysis to evaluate the Accept Drop Prevent
firewall system's operation, determining whether
the policy has been applied successfully and Figure 8: Numerical ratio of output states in the attack data
whether the corrections have been effective. This set.
dataset contains approximately 62,800 log
instances. Figure 9 illustrates the numerical ratio 60000 52718
of all three output states.
50000
40000
30000
20000
10144
10000
24
0
Accept Drop Prevent
92
60000 the beta distribution for variable x and the shape
parameters α and β is given by the following
50000
equation:
40000
30000 1 (5)
20000 𝑓(𝑥, 𝛼, 𝛽) = 𝑥 (1 − 𝑥)
𝐵(𝛼, 𝛽)
10000
0 A change in several characteristics is observed
NoAttack Attack Defence in the TCP and UDP packets, as shown in Figure
12. The figures indicate values for the Generic
Accept Drop Prevent
Routing Encapsulation (GRE) protocol, labelled as
value 47, and the Internet Control Message
Figure 10: Total packet counts showing increases during the Protocol (ICMP). However, the values for these
attack and defence phases.
protocols are negligible. As shown in Figure 12, an
Table 2 presents the percentage ratios for the increase in activity is observed in the UDP ports,
allow, drop, and prevent outputs, calculated suggesting additional network activity. To further
relative to the total number of output values. It confirm this, Figure 13 shows the increase in the
shows that as the percentage of allowed packets number of UDP sessions, indicating a significant
increased from the no-attack dataset to the attack increase in UDP traffic. Given that static NAT is
dataset, meaning the firewall did not respond configured in the firewall system, the local IP
initially. For smaller attacks, the firewall did not address 10.0.80.20 is assigned to the public IP
register significant concern. However, after address (not shown for security reasons), where
adjusting the firewall policy, there was a reduction
specific attacks are made against the public IP
in allowed packets and a corresponding increase
address. Figure 14 shows multiple connections
in rejected packets, as illustrated in Figure 11.
from a single public IP address to the local server
Table 2. Percentage ratios of allow, drop, and prevent outputs at 10.0.80.20.
calculated relative to the total number of output values A more detailed analysis of the number of
sessions and associated IP addresses was
allow/all drop/all prevent/all
no-attack 83% 16.8% 0.034%
conducted using Python programming. The
attack 86.9% 13% 0.020% analysis covered a 10-minute period. The results
defence 83.8% 15.8% 0.038% of this analysis, shown in Figures 15 and 16,
highlight the findings derived from the Python
12000 scripts.
10000
8000
6000
4000
2000
0
NoAttack Attack Defence
Drop Prevent
93
Figure 15: Display of the public IP address as destination
(attack victim).
b)
Figure 12: Examples of TCP and UDP views a) before and
b) after the attack.
b)
Figure 13: Displays of the number of received packets a)
before and b) after the attack.
94
network. Each connection between neurons has
an associated weight, which is updated during
training through optimization algorithms. The
weight value is learned during this process and
plays an important role in the performance of the
network. Bias, on the other hand, is a parameter
that helps the model better understand the data. It
is added to the weighted sum of inputs in each
neuron, allowing the network to account for
discrepancies between the predicted and actual
outputs. Like weights, biases are also learned
during the training process and contribute to the
network's performance optimization.
Number of neurons: Neural networks can
consist of multiple layers, having different
structures, which influencing the model’s ability to
learn complex functions. Currently, there are no
Figure 18: Display of the server-side protocols. definitive recommendations on the optimal
number or types of hidden layers needed to
With two key variables, the type of attack and achieve satisfactory results. To evaluate the data
the IP address from which the attack originates, presented in this study, several models with
the firewall system can be configured to block the varying numbers of neurons per layer were tested,
attack. The process of setting the policy begins at including models with two and three layers. The
the firewall, where you choose to apply the policy accuracy results for the neural network, using the
between the outside and inside zones. In the first training dataset, are presented in Table 3.
step, for the predefined Check Point firewall The best precision was achieved using a two-
solution, the following parameters must be set: layer neural network with a configuration of 5-3, as
o policy name: This attribute does not demonstrated by Čisar in [1], thus confirming the
affect the policy's functionality. effectiveness of the two-layer model. Further
However, when many policies follow the analysis with a three-layer model showed
"top-down" execution principle (where improved precision compared to the two-layer
the first named policy is executed first), model, specifically with a 5-5-4 configuration for
the policy name plays a significant role our data. The structure of the neural network,
in the execution order. based on the number of neurons per layer, is
o source and destination addresses: illustrated in Figure 19.
These fields directly impact the policy's The parameters of the neural network are
functionality. Specifically, sessions influenced by the pre-defined algorithms. For the
between the specified IP addresses will dataset used in this study, an accuracy
be blocked, as recorded by the firewall comparison was performed with the proposed
device. Care must be taken to avoid algorithm. The results of this comparison are
conflicts with IP addresses from other presented in Table 4.
policies, ensuring the proper functioning
of certain services. Table 3. Accuracy value related to the number of neurons
and layers
o protocol settings: The final part of the
policy defines which protocols are
Number of neurons and Accuracy
involved, whether they should be
layers
blocked, and whether a log entry should
3-5 0.940
be created when the policy is triggered.
After explaining the firewall system, this work 4-5 0.945
will explore the concept of the neural network and 5-5 0.954
evaluate the effectiveness of the protection 5-3 0.957
mechanism it provides.
5-4 0.949
5-4-3 0.956
3.2 Attack Analysis through Machine Learning
5-5-3 0.960
Algorithms
5-5-4 0.963
In the following sections of the paper, the impact
of predefined datasets on the neural network will 6-5-4 0.954
be explored, which plays a crucial role in the 4-5-4 0.946
operation of this network.
4-5-5 0.960
Weights and Biases: Weights are parameters
that determine the strength of connections
between neurons in different layers of the neural
95
Figure 19: Structure of the neural network with the number of neurons per layer (5-5-4 configuration).
Table 4 Accuracy comparison for the proposed activation taking prompt action on the firewall features, the
functions
system returned largely to its initial state.
Classification Accuracy (CA): CA represents the
Algorithm AUC CA F1 Precision
proportion of correctly classified examples related
NN
to the total number of examples. The formula for
Identity 0.950 0.942 0.966 0.935
CA is as follows:
Logistic 0.965 0.964 0.979 0.947
Tanh 0.976 0.964 0.997 0.961 Accuracy= (7)
ReLU 0.977 0.964 0.979 0.963
where TP is true positive, TN is true negative,
In Table 4 we have small deviations in the FP is false positive, and FN is false negative.
values for the displayed functions. There are Table 6 presents the CA values. When comparing
deviations at the third decimal point, which the test model to the training model, the CA drops
indicates that the final values in the further by 1.24%, as shown in Table 6. Similarly to the
examination will have small cuts in the values, for AUC parameter, there is no return to the initial
the mentioned functions. Based on the results values after intervention in the firewall system.
obtained, the next phase of the analysis will focus F-score (F1): The F-score measures predictive
on the ReLU function and the 5-5-4 neural network performance, particularly when dealing with
model. After defining the model, along with the unbalanced datasets. The results are presented in
number of neurons, layers, and algorithms, further Table 7, for the described model. A 1.56% drop
network tests can be conducted to improve the related to the training model is observed, but the
performance of the firewall solution. The following values return close to the nominal after the firewall
characteristics were considered during this intervention.
process. Precision: Precision represents the proportion
Area under the ROC (Receiver Operating of true positive events among the cases classified
Characteristic) curve (AUC): The AUC represents as positive. The results are shown in Table 8.
the area under the ROC curve, which assesses Recall: Recall represents the proportion of true
the model's ability to distinguish between true positive events among all positive instances. The
positives and false positives. Values range from 0 results are provided in Table 9.
to 1, where a value of 1 indicates that the model The data was recorded in three cycles. The first
perfectly separates the specified classes. The cycle captured characteristics when there was no
AUC is always a positive number and can be attack on the system. The second cycle recorded
calculated using the following equation: the state during the attack and was used for
training. The third cycle, confirmation, recorded
(6) the firewall status after the necessary corrections
AUC=∫ 𝑅𝑂𝐶 (𝑢)𝑑𝑢 were made.
In this case, in Table 5, an example of weighting Table 5. AUC values across three dataset scenarios
values for all three cases is given. The average
Target Training Testing Conformation
values dropped by 4.61% compared to the training
data. However, after correcting for the firewall Accept 0.977 0.930 0.975
solution, it almost returned to its initial values. Detect 1 1 1
Figure 20 shows the ROC curve for the drop value
Drop 0.977 0.932 0.977
in the three datasets. In this figure, the graph
reveals a decrease in the area of the ROC curve Prevent 1 1 1
during a system attack. These changes ar work Average 0.977 0.932 0.977
and models coulde also reflected in Table 5. By
96
Detect 0 0 0
Drop 0.886 0.791 0.878
Prevent 1 0.839 1
Average 0.963 0.949 0.962
97
in Figure 22. Looking at the CA values in Table 11, we can
state that there was a drop in correctly classified
1,05 sessions by about 12%. This large value indicates
1 that for better detection of malicious sessions, it is
necessary to use a smaller data set to improve
0,95
accuracy.
0,9
0,85
6. CONCLUSION
This paper demonstrates the application of
neural networks with the ReLU activation function
aiming to optimize firewall policies for detection
ReLU Tanh Logistic Identity
and mitigation of network attacks. The proposed
Figure 21: Values of different parameters for activation
5-5-4 model achieved the highest accuracy of
functions 96.3%, surpassing two-layer architectures by
0.6% and the next-best three-layer architecture by
Table 10. Values of Drop results for all functions 0.3%. The analysis, based on real working
environment datasets, confirmed that this
Target AUC CA F1 Precision Recall approach improves the accuracy of attack
ReLU 0.932 0.952 0.791 0.908 0.701 detection and improves the effectiveness of
Tanh 0.929 0.946 0.768 0.862 0.693 firewall policy adjustments, even in dynamic and
complex environments.
Logistic 0.910 0.949 0.782 0.782 0.709
During the testing, a decrease in performance
Identity 0.913 0.936 0.692 0.931 0.505 metrics was observed: The area under the curve
(AUC) decreased by 4.61%, the classification
1 precision (CA) by 1.24%, the F1 score by 1.56%,
Precision by 1.3%, and recall by 1.2% compared
to the training values. After applying optimized
0,5
firewall policies, most parameters returned close
to their initial values, demonstrating the reliability
0 of the proposed model in real working scenarios.
AUC CA F1 Precision Recall The study used datasets consisting of
approximately 54,500 instances for training,
ReLU Tanh Logistic Identity 64,100 instances during active attacks, and
62,800 instances after firewall policy adjustments.
Figure 22: Values of defined deviations. This ensures the results are based on realistic
conditions and it validates the practical
5. COMPARING THE RESULTS WITH A LARGER applicability. These findings are particularly
DATASET relevant in environments such as IoT systems,
where quick detection and prevention of attacks
In the paper, the analysis was done with a small are essential to maintain network security.
number of samples, up to 54,500 defined Although the results are encouraging, the study
sessions. The values of some parameters are is limited by the use of a single dataset and
quite high. Table 11 shows all parameters within software platform. Future research should focus
the ReLU function, but for a data set containing on evaluating the model's adaptability to larger
1 000 000 sessions. and more diverse datasets (IoT) and exploring its
integration into broader cybersecurity frameworks
Table 11. Values for complete data set to enhance scalability and applicability.
This study confirms the potential of machine
Target AUC CA F1 Precision Recall learning, particularly neural networks, as an
Accept 0.809 0.837 0.896 0.820 0.987 effective tool for improving network security. The
proposed approach provides a solid foundation for
Detect 1.000 0.837 0.782 0.642 1.000
further research and development aimed at
Drop 0.998 0.837 0.997 1.000 0.993 creating more adaptive and robust cybersecurity
Prevent 0.967 0.837 0.000 0.000 0.000 solutions for increasingly complex and
interconnected systems.
Reject 0.753 0.837 0.357 0.834 0.227
Due to the very difficult option of comparing two
data sets whose values are quite different, the
comparison will be made only according to the
parameter CA, i.e. according to the parameter of
correctly classified examples.
98
REFERENCES Detection,” JMIS, vol. 3, no. 3, pp. 91–99. 2016.
[1] Čisar, P., Popović, B., Kuk, K., Čisar, S., Vuković, I., https://doi.org/10.9717/JMIS.2016.3.3.91
“Machine Learning Aspects of Internet Firewall Data,” [19] Stolfo, S., Fan, W., Lee, W., Prodromidis, A., Chan, P.,
Springer, 10.1007/978-94-024-2174-3_4. 2022. “KDD Cup 1999 Data,” UCI Machine Learning
[2] Demsar, J., Curk, T., Erjavec, A., Gorup, C., Hocevar, Repository, Accessed: Nov. 23, 2024, Available:
T., Milutinovic, M., Mozina, M., Polajnar, M., Toplak, M., https://doi.org/10.24432/C51C7N
Staric, A., Stajdohar, M., Umek, L., Zagar, L., Zbontar, [20] Valentin, K., Maly, M., “Network firewall using artificial
J., Zitnik, M., Zupan, B., “Orange: Data Mining Toolbox neural networks,” Computing and Informatics, vol. 32,
in Python,” Journal of Machine Learning Research, vol. pp. 1312–1327. 2013.
14, no. Aug, pp. 2349–2353. 2013. [21] Rahman, M.H., Islam, T., Rana, M.M., Tasnim, R.,
[3] Petrović, M., “Osnovi veštačkih neuronskih mreža i Mona, T., Sakib, M., “Machine Learning Approach on
značaj njihove primene,” Zbornik radova Građevinskog Multiclass Classification of Internet Firewall Log Files,”
fakulteta, Subotica, no. 20, pp. 47–55. 2011. 10.48550/arXiv.2306.07997. 2023.
[4] Knežević, S., Mileta, Ž., Žarković, “Predviđanje [22] Agarap, A.F., “Deep Learning using Rectified Linear
proizvodnje termoelektrane pomoću neuralnih mreža,” Units (ReLU),” 10.48550/arXiv.1803.08375. 2018.
Energija, ekonomija, ekologija, vol. XXV, pp. 38–41. [23] LeCun, Y., Cortes, C., Burges, C., “MNIST Handwritten
2023. https://doi.org/10.46793/EEE23-4.38K Digit Database,” AT & T Labs, vol. 2, Accessed: Nov. 23,
[5] Kramberger, T., Nozica, B., Dodig, I., Cafuta, D., 2024, Available: http://yann.lecun.com/exdb/mnist
“Pregled tehnologija u neuronskim mrežama,” [24] Checkpoint, Accessed: Nov. 23, 2024, Available:
10.19279/TVZ.PD.2019-7-1-04. 2019. https://www.checkpoint.com/
[6] Nikolić, M., Zečević, A., “Machine learning,” Faculty of [25] Shankar Narayana Damodaran, Accessed: Nov. 23,
Mathematics, Belgrade, Serbia, 2019. 2024, Available: https://github.com/skavngr/netbot
[7] Coraline Ada Ehmke, “How Do Neural Networks Make [26] “Wireshark: Network Analyzer,” Accessed: Nov. 23,
Decisions: A Look at Activation Functions,” Accessed: 2024, Available: https://www.wireshark.org
Nov. 23, 2024, Available:
https://www.goglides.dev/bkpandey/how-do-neural- [27] Cloudflare, Accessed: Nov. 23, 2024, Available:
networks-make-decisions-a-look-at-activation-functions- https://www.cloudflare.com/learning/bots/what-is-a-bot-
141e attack
[8] Dubey, S.R., Singh, S.K., Chaudhuri, B.B., “Activation
functions in deep learning: A comprehensive survey and
benchmark,” Neurocomputing, vol. 503, pp. 92–108. Dragan Jevtić, Infrastructure of Serbian Railway, Belgrade, Serbia
2022. https://doi.org/10.1016/j.neucom.2022.06.111 (e-mail: [email protected])
[9] Bai, Y., “RELU-Function and Derived Function Review,” Petar Čisar, University of Criminal Investigation and Police Studies,
SHS Web of Conferences, vol. 144, 02006. 2022. Belgrade, Serbia, (e-mail: [email protected]), 0000-0002-8129-
https://doi.org/10.1051/shsconf/202214402006 288X,
[10] PyTorch Contributors, Accessed: Nov. 23, 2024, John von Neumann University, GAMF Faculty of Engineering and
Available: Computer Science, Kecskemét, Hungary (e-
https://pytorch.org/docs/stable/generated/torch.nn.ELU.h mail: [email protected])
tml Kristijan Kuk, University of Criminal Investigation and Police
Studies, Belgrade, Serbia, (e-mail: [email protected]), 0000-
[11] Maslan, A., Mohamad, K., Mohd Foozy, C.F., “Feature 0001-8910-791X
selection for DDoS detection using classification Vladica Stojanović, University of Criminal Investigation and Police
machine learning techniques,” IAES International Studies, Belgrade, Serbia, (e-mail: [email protected])
Journal of Artificial Intelligence (IJ-AI), vol. 9, pp. 137– 0000-0002-3819-4387
145. 2020. https://doi.org/10.11591/ijai.v9.i1.pp137-145
[12] Najafimehr, M., Zarifzadeh, S., Mostafavi, S., “A hybrid Dragan Jevtić is a PhD student at the
machine learning approach for detecting unprecedented University of Criminal Investigation and Policy
DDoS attacks,” J Supercomput., vol. 78, no. 6, pp. Studies. Currently working as Project Manager
8106–8136. 2022. https://doi.org/10.1007/s11227-021- in the IT Department, Infrastructure of Serbian
04253-x Railways. His research is mainly focused on
[13] Stephan J, Sahab M., Abbas M., “Neural network using machine learning in various situations in
Approach to Web Application protections”, International railway environments, ranging from detecting
Journal of Information and Education Technology, Vol. 5, attacks (from the Internet and the intranet) by
No. 2, February 2015 rapidly analysing large amounts of data to
[14] Tiwari S., Kumar N., Joshi K., Kumar S., “Enhancing investigating anomalies in transmission or attack itself.
Cyber Security: A Comparative Study of Artificial Neural
Networks and Machine Learning for Improved Network Petar Čisar graduated from the University of
Vulnerability Detections “, Advanced technologies for Belgrade School of Electrical Engineering
realizing sustainable development goals 5G, AI. Big and earned a PhD in Information Sciences
Data, Block Chain and Industry 2.0 applications, from University of Novi Sad. He is a full
Bentham Books, Singapore, 2024. professor at the University of Criminal
Investigation and Police Studies, Belgrade,
[15] Ghulam Mohi-ud-din, December 29, 2018, "NSL-KDD", and an associate professor at John von
IEEE Dataport, doi: https://dx.doi.org/10.21227/425a- Neumann University, Kecskemét. A member of the
3e55. International Society for the Implementation of Fuzzy Theory in
[16] Mahmood, S., Hasan, R., Yahaya, A., Hussain, S., Budapest and an external member of the Hungarian Academy
Hussain, M., “Evaluation of the Omni-Secure Firewall of Sciences and Arts, he has authored over 150 scientific
System in a Private Cloud Environment,” Knowledge, papers, with 400+ independent citations. His research focuses
vol. 4. 2024. https://doi.org/10.3390/knowledge4020008 on computer and telecommunication networks, network
[17] Lillmond, C., Suddul, G., “A Deep Neural Network security, digital forensics, and AI implementations.
Approach for Analysis of Firewall Log Data,”
10.13140/RG.2.2.27458.04808. 2021.
[18] Le, T.-T.-H., Kim, J., Kim, H., “Analyzing Effective of
Activation Functions on Recurrent Network for Intrusion
99
Kristijan Kuk earned his M.Sc. from the
Technical Faculty in Zrenjanin, University of
Novi Sad, and his PhD in Informatics and
Computing Science from the Faculty of
Electronic Engineering, University of Niš. He
is a full professor at the Faculty of Computer
Science and Information Technology,
University of Criminal Investigation and
Police Studies in Belgrade. He has authored
over 15 papers that have been published in scientific journals
from SCI/E lists; two papers have been published as a chapter
for Springer and two chapters for Elsevier books. His research
interests include intelligent agents, data mining techniques,
and secure software development.
100
View publication stats