High Performance Intrusion Detection Systemusing Ebpf With bjzndc38
High Performance Intrusion Detection Systemusing Ebpf With bjzndc38
Research Article
Keywords: DoS, DDOS, eBPF, Random Forest, Decision Tree, SVM ,TwinSVM
DOI: https://doi.org/10.21203/rs.3.rs-3140072/v1
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License
Abstract
Denial of Service (DoS) and Distributed DoS (DDoS) attacks are standard prob-
lems organizations that rely on network services face. Detecting these attacks
promptly and accurately is crucial to mitigating the damage caused. This paper
proposes an Intrusion Detection System (IDS) that utilizes the extended Berke-
ley Packet Filter (eBPF) with machine learning algorithms, namely Decision Tree
(DT), Random Forest (RF), Support Vector Machine (SVM), and TwinSVM.
eBPF is a bytecode-based virtual machine that runs programs without modifying
the kernel source code. It can implement various services, such as observabil-
ity, security, and networking. Socket filters are an eBPF program attached to
the socket in the Linux kernel that allows for efficient filtering and manip-
ulation of network packets at the socket after packets are received from the
network stack. Packets that are filtered at the socket level before entering
the user space. The steps involved in the proposed model are: a) collecting
data from famous repository, CIC-IDS-2017. b) Once the raw data is obtained,
it undergoes preprocessing, which includes data transmission, cleaning, reduc-
tion, and discretization. c) Following the preprocessing step, an ANOVA F-test
extracts specific features from the preprocessed data. d) Lastly, the extracted
features are analyzed for intrusion detection using various ML algorithms:
DT, RF, SVM, and TwinSVM. e) The eBPF program captures network traf-
fic and utilizes trained model parameters to detect attacks within the kernel.
Our experimental results show that the accuracy of our proposed ML algo-
rithms, DT, RF, SVM, and TwinSVM, outperforms the existing related work:
99.38, 99.44, 88.73, and 93.82, respectively. The experimental code available in
https://github.com/NemalikantiAnand/Project.git
1
Fig. 1: eBPF Architecture
Keywords: DoS, DDOS, eBPF, Random Forest, Decision Tree, SVM ,TwinSVM
1 Introduction
eBPF stands for extended Berkeley Packet Filter. From that name, we can see it is a
packet filter. However, it is now used for performance monitoring, tracing, and opti-
mization. Users are given the ability to build real-time programmes that interact with
the Linux kernel and the various components of the system. In the past, practically
all of the content on web sites was written in a format known as hypertext markup
language (HTML). The act of browsing websites has evolved into a full-fledged appli-
cation, and web-based technology has mostly supplanted traditional software. This
development was made possible by programmability, which was made possible with
the advent of JavaScript [1][2][3].
2
In the same vein, eBPF is the solution to use if we want to dynamically update
the kernel. This solution is analogous to the one that JavaScript provides for HTML.
The Linux kernel is undergoing a transformation brought on by eBPF, similar to how
JavaScript altered the web. Users are able to execute programmes in a secure setting by
using eBPF, which permits the execution of sandboxed programmes inside the context
of privileged operations within the operating system[4]. Since the programmes are run
in the kernel, this results in a much reduced amount of overhead compared to using
native kernel modules. eBPF enables a wide variety of application cases, ranging from
simple network monitoring to intricate performance optimisation and security checks
[1][2][3]. eBPF programmes are event-driven programmes that are activated when a
hook point is passed by the kernel or an application. The terms system calls,“ function
entry and exit,” kernel tracepoints,” and network events” are all examples of pre-
defined hook points. If there is not already a pre-defined hook, the kernel probe (probe)
and the user probe will be able to attach eBPF programmes practically anywhere in
the user applications or the kernel itself [1][2][3][4]. The eBPF architecture, as seen in
the Figure 1. eBPF is made up of a number of different components, such as a virtual
machine, a collection of libraries, and a set of maps, when seen from the viewpoint of
its underlying architecture. While the eBPF programme is being run via the virtual
machine, the libraries are providing a set of helper functions[1][2][3][4] for the eBPF
programme to make use of.
eBPF is a framework that gives users the ability to load and execute their own cus-
tomised programmes directly into the kernel of the operating system. When an eBPF
programme is loaded into the kernel, a verifier checks to see whether it is safe to exe-
cute and decides whether or not to reject it if it is not safe. After it has been loaded,
an eBPF programme must be connected to an event in order for it to be activated
whenever the associated event takes place. With the help of the Low-Level Virtual
Machine (LLVM) compiler[5], we are able to transform pseudo-C code into eBPF byte-
code. This is necessary since the Linux kernel anticipates that eBPF programmes will
be loaded as bytecode. It is possible to load an eBPF programme into the Linux ker-
nel by using the BPF system call. This is commonly accomplished by utilising one
of the eBPF libraries that are currently available. Before the programme can be con-
nected to the specified hook, it must first be loaded into the Linux kernel and then go
through the following two steps: Both the verification step and the Just-in-Time (JIT)
[6][1][2][3][4] compilation step ensure that the eBPF programme is safe to run. The
JIT compilation step optimises the execution speed of the programme by translating
the generic bytecode of the programme into the machine-specific instruction set. This
allows eBPF programmes to run as efficiently as natively compiled kernel code or as
code loaded as a kernel module [1][2][3][4].
eBPF programmes have the capability to save and communicate information that
they have gathered with one another. For this reason, eBPF programmes may make
use of the idea of eBPF maps, which are comparable to arrays or hash tables and enable
eBPF programmes to store and retrieve data in real time. eBPF maps[1][2][3][4][4] are
similar to arrays and hash tables. Through the use of a system call, eBPF applications
and programmes running in user space are able to get access to eBPF maps. The
following kinds of maps are supported in this list: eBPF programmes are composable
3
Table 1: Overall dataset description of CIC-IDS-2017
Number Classes Number of Records Total Data
1 “BENIGN” 2273097 80.30%
2 “DoS Hulk” 231073 8.16%
3 “DDoS” 128027 4.52%
4 “DoS GoldenEye” 10293 0.36%
5 “DoS slowloris” 5796 0.20%
6 “DoS Slowhttptest” 5499 0.19%
7 “PortScan” 158930 5.61%
8 “FTP-Patator” 7938 0.28%
9 “SSH-Patator” 5897 0.21%
10 “Bot” 1966 0.06%
11 “Web Attack-Brute Force” 1507 0.05%
12 “Web Attack-XSS” 652 0.02%
13 “Infiltration” 36 0.001%
14 “Web Attack-Sql Injection” 21 0.0007%
15 “Heartbleed” 11 0.0003%
using the ideas of tail and function calls, allowing for the creation of hash tables,
arrays, Least Recently Used (LRU), ring buffers, Longest Prefix Match (LPM), and so
on. Within the context of an eBPF programme, function calls enable the definition and
invocation of functions. Akin to the way in which the execve() system call functions
for ordinary processes, tail calls [1][2][3][4] have the ability to call and run another
eBPF programme while simultaneously replacing the execution environment. eBPF
programmes are efficient and compact computer instructions that may be run directly
within the Linux kernel. They are written in C, however a version that has restrictions
is referred to as Restricted-C. This subset of C was chosen after much deliberation in
order to make the environment in which eBPF programmes run as safe and effective
as possible. Because it only supports a subset of eBPF’s functions and data types,
it makes executing eBPF code in the kernel far less risky. The limitations placed on
loops in Restricted-C and the use of floating point integers are two of the numerous
constraints that are considered significant[7]. Real-time interaction with the system is
made possible by the fact that eBPF programmes may be tied to a wide variety of
events in the system, including system calls, network packets, and others.
Let’s have a look at a diagram that illustrates a typical workflow for the process
of building and deploying an eBPF programme. The eBPF programming language
is a limited form of C that uses maps. Since the Linux kernel anticipates that
eBPF programmes will be loaded in bytecode, the LLVM is responsible for compiling
restricted-C code into eBPF bytecode. With the help of the BPF system call, an eBPF
bytecode program may be loaded into the eBPF verifier that’s included in the Linux
kernel. In most cases, this is accomplished with the use of a library, the BPF Compiler
Collection (BCC)[8]. After the verifier confirms that the program has not introduced
any vulnerabilities and has been correctly written, it is sent on to the JIT compiler,
which converts the bytecode into the native machine code [9]. After being loaded into
4
Table 2: Extracted top 15 features description of CIC-IDS-2017 Dataset for DT and
RF
F.No Feature Description
“Minimum idle time observed
1 “Idle Min”
in a network flow”
“Minimum length of the
2 “Bwd Packet Length Min”
backward packets”
“Average idle time in
3 “Idle Mean”
a network flow”
“Forward Inter-arrival
4 “Fwd IAT Total”
Time Total”
“Average length of the
5 “Bwd Packet Length Mean”
backward packets”
“Average Inter-arrival time between
6 “Fwd IAT Mean”
consecutive forward packets”
“Smallest length observed
7 “Min Packet Length”
among all the packets”
“Average length of all
8 “Packet Length Mean”
the packets”
“Maximum Inter-arrival time between
9 “Fwd IAT Max”
consecutive forward packets“
“Mean packet size observed
10 “Average Packet Size”
in a network flow”
“Maximum length observed among
11 “Max Packet Length”
all the packets”
“Variance of packet lengths
12 “Packet Length Variance”
in a network flow”
“Average size of backward
13 “Avg Bwd Segment Size”
segments in a network flow”
“Maximum length of the
14 “Bwd Packet Length Max”
backward packets”
“Maximum idle time observed
15 “Idle Max”
in a network flow”
the kernel, an eBPF program has to be associated to an event before it can be used.
Whenever the event takes place, the eBPF program (or programs) that are related
with it are executed. In this scenario, Sockets enables the attachment of an eBPF pro-
gramme to a network interface [10]. As a result, data packets are routed via the eBPF
process running in the kernel space before being sent to the real user process.
2 Related Work
[11] suggests a flow-based IDS that may be implemented using ML in eBPF. Used
the widely used CIC-IDS-2017 dataset and trained the DT using sci-kit-learn using a
5
Table 3: Extracted top 10 features description of CIC-IDS-2017 dataset for SVM and
TwinSVM
No Feature Description
“Average idle time
1 “Idle Mean”
in a network flow”
“Maximum Inter-arrival time between
2 “Fwd IAT Max”
consecutive forward packets”
“Average length of
3 “Packet Length Mean”
all the packets”
“Minimum idle time observed
4 “Idle Min”
in a network flow”
“Maximum length of the
5 “Bwd Packet Length Max”
backward packets”
“Maximum length observed among
6 “Max Packet Length”
all the packets”
“Average size of backward segments
7 “Avg Bwd Segment Size”
in a network flow”
“Variance of packet lengths
8 “Packet Length Variance”
in a network flow”
“Maximum idle time observed
9 “Idle Max”
in a network flow”
“Average length of the
10 “Bwd Packet Length Mean”
backward packets”
maximum number of leaves of one thousand and a maximum depth of ten with the
training and testing ration of two to one. This results in 0.9 accuracy score on the
dataset used for testing. Using the previously taught DT model, he also developed the
same IDS being used in userspace as well as in the eBPF. Written code is identical in
every respect, with the exception of the data structures. This is due to the fact that
eBPF does not support a large number of the standard data structures. Hash maps and
other eBPF data structures are not available in a standard C userspace application.
As a result, they implemented the userspace version using a straightforward hash
map using code taken from the Linux kernel. The author of the work implemented
IDS as a traditional userspace programme in it by making advantage of eBPF and
ensuring that the processes ran sequentially rather than simultaneously. Ten seconds
are spent running both implementations. They evaluated the data and concluded
that the userspace implementation analyses 125420 packets every second, compared
to 152274 for eBPF. Because of this, eBPF is nearly 20% faster than userspace.
The paper [12] makes a suggestion for the design and implementation of an IDS that
makes use of eBPF inside the Linux kernel. To begin, they suggested using a method
based on eBPF to design and deploy IDS systems. They develop and execute an IDS
that is comprised of two components that collaborate with one another. The initial
portion of the code is executed in the Linux kernel. It does quick pattern matching
with eBPF in order to pre-drop a very big part of packets that have no possibility
6
Fig. 2: Proposed model using RF and DT algorithms
of matching any rule. This is done to save bandwidth. The user’s environment is the
focus of the second component. It investigates the packets that were left behind by
the previous portion in order to locate the rules that correspond to those packets.
Under many measured conditions, an IDS system’s maximum throughput may exceed
Snort’s by a factor of three.
Author[2] states that eBPF enables runtime modification, interaction, and kernel
programmability. XDP (eXpress Data Path) framework utilizes eBPF to write pro-
grams to process packets closer to the NIC for fast packet processing. He states that
programs can be written in C or P4[13] languages, compiled into eBPF instructions,
and then loaded into the kernel, providing an eBPF runtime environment. His work
will include eBPF and XDP rapid packet processing theory and practise. Theoreti-
cally, he covered BPF and eBPF machines and the Linux kernel’s eBPF system. He
demonstrated eBPF and the XDP hook with examples and tools. He thinks eBPF and
XDP may advance new research initiatives since they process packets quickly.
7
Fig. 3: Proposed model using SVM and TwinSVM algorithms
Snort [14] and Suricata [15] are both capable of filtering packets using eBPF.
However, none of them are capable of using eBPF to match pattern in the packet
content, and they only parse as far as the layer-3 header. Utilising the -F command-line
option in Snort enables users to provide a filter expression in the way of tcpdump. After
some time has passed, the phrase will be transformed into eBPF commands. Similarly,
Suricata employs eBPF for XDP-based flow bypassing,load balancing, and packet
filtering. The main difference between our solution and Suricata is that we support
using customized eBPF scripts instead of pre-written expressions. Additionally, by
employing eBPF, our system can look at packet payloads. When the context is an
eBPF file, however, Suricata does not do pattern matching. DPI in Suricata utilizing
8
Fig. 4: ANOVA F-test on top 15 features of DoS/DDoS dataset for Random Forest
and Decision Tree
Fig. 5: ANOVA F-test on top 15 features of overall (all packets inspection) dataset
for Random Forest and Decision Tree
eBPF is, therefore, no longer possible. An eBPF-based DPI method was created [16] in
order to identify the different kinds of video frames that were conveyed in the packets.
It can only handle packets in one format, which is inadequate for an IDS. ebpfH is
an eBPF-based host-based IDS which makes use of eBPF [17]. On the other hand, it
does not identify network abnormalities but rather system anomalies.
A comprehensive analysis of eBPF was carried out by the authors of [2]. This
study’s results include technical specifics and a breakdown of the locations that have
implemented eBPF. eBPF has been used to rewrite a significant number of the net-
work’s fundamental functionalities, including but not limited to routing [18], switching
[19], and firewalls [20]. These features include load balancing [21], key–value storage
[22], application level filtering [23], and DDoS mitigation [24]. In-KeV [25] is a frame-
work for developing network services that run within the kernel. eBPF creates in-kernel
9
Fig. 6: ANOVA F-test on top 10 features of overall (all packets inspection) dataset
for SVM and TwinSVM
Fig. 7: ANOVA F-test on top 10 features of DoS/DDoS dataset for SVM and
TwinSVM
Service Function Chains (SFC) via tail calls. [26] explored the limits of eBPF and
their own experiences with eBPF. The authors examined the performance of eBPF
to filter packets based on the 5-tuple information in the packet header in [27]. Using
eBPF, [28] created an open-source 5G mobile gateway. 5G networks are becoming
more commonplace. eBPF was utilized for monitoring the communications between
Virtual Machines (InterVM) by the creators of [29]. A framework for implementing
eBPF-based network functions for microservices [30].
When implementing complicated network functions, the authors of [26] pointed
out several significant parameters that impact the performance of eBPF. This
10
Fig. 8: Data distribution of top 15 features in CIC-IDS-2017 dataset with Label 0 is
Benign and Label 1 is Attacks.
article [27] focused on eBPF-based firewalls, which are simpler than IDSs. cite-
hohlfeld2019demystifying examined XDP performance in VM and hardware offloading
situations [31]. The results of this study demonstrated that XDP within VMs saw per-
formance reductions. String matching is handled by Snort using a combination of the
Boyer–Moore (BM) [32] and Aho–Corasick (AC) [33] algorithms. [34] suggests using
AI to identify performance irregularities using eBPF. eBPF is used in [35], [36] and
also [37] in order to build countermeasures for DoS assaults. They don’t make use of
ML at all.
11
Fig. 10: Data distribution of top 10 features in CIC-IDS-2017 dataset with Label 0
is Benign and Label 1 is Attacks.
12
Fig. 11: Data distribution of top 10 features for DoS/DDoS in CIC-IDS-2017 dataset
with Label 0 is Benign and Label 1 is Attacks.
various types of attacks. The dataset is designed to be used for intrusion detection and
prevention purposes. By using this dataset, we can detect DoS/DDoS attacks. First,
we should analyze the CIC-IDS-2017 dataset to get some essential features that can
be trained into machine learning algorithms. Training is done by using the Analysis
of Variance (ANOVA) F-test technique. The CIC-IDS-2017 dataset is a collection of
network traffic data consisting of 78 features. These features are destination port, total
forward packets, total backward packets, minimum packet length, etc. This dataset is
commonly used for testing intrusion detection systems.In previous work, the author[11]
used 12 features of the overall packet inspection of CIC-IDS-2017 dataset for the
DT model, which is used in eBPF. So, we worked on the top 15 features of the
comprehensive packet inspection of CIC-IDS-2017 dataset and the DoS/DDoS attack
of the CIC-IDS-2017 dataset for better performance for DT, RF, SVM, and Twin-
SVM training. But for SVM and Twin-SVM, it takes more time and consuming the
entire RAM memory for 15 feature training, so we reduced it to 10 features for training
and testing. To detect DoS/DDoS attacks using the CIC-IDS-2017 dataset, we first
analyze the dataset; then, by using ANOVA F-test, we extract the top 15 features
from 78 features for DT and RF and the top 10 features from 78 features for SVM and
TwinSVM. ANOVA F-test determines features that have the most significant impact
on the target variable. ANOVA is a statistical method that compares the means of
multiple groups to determine if there are substantial differences among them. The
selection process involves calculating the F-value for each feature using ANOVA. The
F-value represents the ratio of the variation between groups to the variation within
groups if the features with high F-values indicate strong dependence on the target
variable. By using ANOVA F-test feature selection, we can reduce the dimensionality
of the dataset.
Based on several considerations, we are using the top 15 or 10 features, extracted
using an ANOVA F-test, instead of all features from the CIC-IDS-2017 dataset. 1.
Dimensionality reduction: The CIC-IDS-2017 dataset contains 78 features; using too
many features sometimes leads to overfitting, which will reduce the model’s accuracy.
2. Feature relevance: All features in the dataset may not be equally informative or
13
relevant for data analysis. Using the ANOVA F-test helps identify features that have a
more significant impact on the target variable. 3. Complexity and resource constraints:
Including all 78 features in the eBPF implementation might increase the complexity
of the code. eBPF programs are typically designed to run efficiently within limited
resource constraints. Using more features could lead to increased memory usage, longer
processing times, and performance issues. However, there are some drawbacks to using
only the top 15 or top 10 features because the model might only detect the attack if
the features of malicious packets are represented in the top 15 or top 10 features.
14
Algorithm 2 Proposed Random Forest Algorithm
1: Initialize children lef t,children right,threshold,f eature and value from model
parameters
2: Initialize real f eature value from packet data
3: Initialize tree number from model parameters
4: current node ← 0
5: T rue count ← 0
6: F alse count ← 0
7: for tree number = 0 to n estimators − 1 do
8: for i = 0 to M AX T REE DEP T H − 1 do
9: current lef t child ← children left[current node,tree number]
10: current right child ← children right[current node,tree number]
11: current f eature ← feature[current node,tree number]
12: current threshold ← threshold[current node,tree number]
13: if current lef t child = TREE LEAF || current right child =
TREE LEAF then
14: break
15: else
16: real f eature value ← [current feature,tree number]
17: if real f eature value ≤ current threshold then
18: current node ← current left child
19: else
20: current node ← current right child
21: end if
22: end if
23: end for
24: end for
25: for tree number = 0 to n estimators − 1 do
26: correct value = value[current node,tree number]
27: if ∗correct value then
28: True count ← True count + 1
29: else
30: False count ← False count + 1
31: end if
32: end for
33: if True count > False count then
34: correct value = 1
35: else
36: correct value = 0
37: end if
38: prediction ← 1 if correct value=1 else 0
15
3.3 Decision Tree
We make use of the well-known CIC-IDS-2017 dataset with the top 15 features for
data analysis and prediction, which can be found at [38]. We train the DT using sci-
kit-learn, with a maximum depth of fifteen and a maximum number of leaves of one
thousand, using a train/test split of eighty percent and twenty percent, respectively.
Data distribution with Label 0 is “Benign” and with Label 1 is “Attack” is shown
in Figure 8 and Data distribution for DOS or DDOS with Label 0 is “Benign” and
with Label 1 is “Attack” is shown in Figure 9. After training and testing with the
DT, we export model parameters left children, right children, threshold, features, and
value of each DT. The proposed DT algorithm is given in Algorithm 1. The overall
(all packets inspection) dataset and DoS/DDoS performance parameters are given in
Table 4, Table 5.
16
Label 1 is “Attack” is shown in Figure 8 and Data distribution for DOS or DDOS
with Label 0 is “Benign” and with Label 1 is “Attack” is shown in Figure 9. After
training and testing with RF, we export model parameters: left children, right children,
threshold, features, and value of each DT. The proposed RF algorithm is given in
Algorithm 2. The overall (all packets inspection) dataset and DoS/DDoS detection
performance parameters are shown in Table 4 and Table 5.
17
(a) (b)
(c) (d)
Fig. 12: Confusion Matrix for (a)Decision Tree , (b)Random forest , (c)SVM , and
(d)TwinSVM of overall (all packets inspection) CIC-IDS-2017 dataset.
18
and later convert them into classes using some function like sigmoid or tanh. The
final output is some function of the weights with some non-linear transformation, so
we cannot use fixed point representation in NN and XGBoost. The proposed SVM
and TwinSVM algorithms are given in Algorithm 3 and Algorithm 4. The overall
(all packets inspection) dataset and DoS/DDoS detection performance parameters are
shown in Table 4 and Table 5.
4 Performance Analysis
The performance parameters of ML algorithms are accuracy, precision, recall/sensi-
tivity, F1-score, and specificity. Using the widely used CIC-IDS-2017 dataset with 12
features, Maximilian Bachl et al. [11] train the DT using sci-kit-learn with a maximum
depth of ten and a maximum number of leaves of one thousand using a train/test
split of 2:1. This results in an accuracy of 99.0% on the testing dataset after training.
We used the same CIC-IDS-2017 dataset and extracted the top 15 features using the
ANOVA F-test method. We train the DT using sci-kit-learn with a maximum depth
of 15, and a train/test split of 4:1. The comparison of accuracy in user space and ker-
nel space is shown in Table 8. We can see in this table, our experimental results show
an accuracy of 99.38 percent when testing the dataset after training, which is better
than the accuracy attained with [11] (existing related work). Further, compared to
DT and SVM models, RF is performing better, this can be seen with the accuracy of
99.44 percent. Also, we can see in the table that we calculated DoS/DDoS detection,
and the accuracy was found to be 99.57 percent.
RF is an ensemble learning method for classification and regression that can detect
DoS/DDoS attacks by analyzing network traffic patterns and identifying unusual
behavior that may indicate an attack. We trained RF using popular CIC-IDS-2017
with a test/train split of 1:4 with a max depth of 20. After training and testing, we
export RF model parameters. This approach uses eBPF and Socket filters for DoS/D-
DoS mitigation. It is implemented using an RF classifier in eBPF, which helps classify
incoming network packets in real time. The eBPF program attached to the socket
hook can filter and record the necessary traffic features and then pass the data to user
space. The algorithm then makes the prediction, and then eBPF can take the required
actions, such as dropping the packets or redirecting them to a DDoS mitigation sys-
tem. In this process, we achieve better performance using RF compared to DT, SVM,
and TwinSVM, 99.44 overall (all packets inspection) accuracy and 99.58 DoS/DDoS
detection accuracy.
19
(a) (b)
(c) (d)
Fig. 13: Confusion Matrix for (a)Decision Tree , (b)Random forest , (c)SVM , and
(d)TwinSVM of DoS/DDoS in CIC-IDS-2017 dataset.
20
Table 4: Experimental results of overall (all packets inspection) CIC-IDS-2017 dataset
in userspace
Performance
Method DT RF SVM TwinSVM
Parameters
Train 99.52 99.59 88.77 93.87
Accuracy
Test 99.38 99.44 88.74 93.82
Train 99.71 99.74 78.97 98.58
Precision
Test 99.46 99.51 78.97 98.49
Recall/ Train 98.49 98.72 73.41 75.9
Sensitivity Test 98.21 98.37 73.26 75.78
Train 99.09 99.23 76.09 85.76
F1 Score
Test 98.83 98.94 76.01 85.65
Train 99.89 99.91 93.71 99.65
Specificity
Test 99.81 99.82 93.73 99.63
Tables 6 and 7, the userspace implementation examines a lower number of packets per
second than the eBPF version does.
5 Declarations
Not Applicable
21
Table 6: Time taken in user space vs kernel space of overall (all packets inspection)
CIC-IDS-2017 dataset
Algorithms Userspace (Mean) eBPF (Mean)
Decision Tree packet/s 46239 109691
Random Forest packet/s 45978 108534
SVM packet/s 45590 92978
TwinSVM packet/s 38430 109865
Table 8: Accuracy, user space, eBPF space comparison with related work
User Space eBPF
Author/Parameter Algorithm Accuracy
packet/s packet/s
[11] Decision Tree 99 125420 152274
Proposed Decision Tree 99.38 46239 109691
Proposed Random Forest 99.44 45978 108534
Proposed SVM 88.74 45590 92978
Proposed TwinSVM 93.82 38430 109865
related work: 99.38, 99.44, 88.73, and 93.82, respectively. For future research analysis,
we can build an IDS that utilizes the eBPF and the XDP with ML algorithms such
as DT, RF, SVM and TwinSVM. XDP is a programmable data path interface in the
Linux kernel that allows for efficient filtering and manipulation of network packets at
the Network Interface Card (NIC) driver level. XDP programs are written in eBPF
bytecode, are attached to network devices, and can execute on the NIC before the
packet reaches the kernel network stack and act on the packet directly on the NIC.
References
[1] LLC, M.: eBPF Documentation. https://ebpf.io/what-is-ebpf/ Accessed 2023-06-
19
[2] Vieira, M.A., Castanho, M.S., Pacı́fico, R.D., Santos, E.R., Júnior, E.P.C., Vieira,
L.F.: Fast packet processing with ebpf and xdp: Concepts, code, challenges, and
applications. ACM Computing Surveys (CSUR) 53(1), 1–36 (2020)
[3] Høiland-Jørgensen, T., Brouer, J.D., Borkmann, D., Fastabend, J., Herbert, T.,
22
Ahern, D., Miller, D.: The express data path: Fast programmable packet pro-
cessing in the operating system kernel. In: Proceedings of the 14th International
Conference on Emerging Networking Experiments and Technologies, pp. 54–66
(2018)
[4] Sharaf, H., Ahmad, I., Dimitriou, T.: Extended berkeley packet filter: An
application perspective. IEEE Access (2022)
[5] Lattner, C., Adve, V.: The llvm compiler framework and infrastructure tutorial,
15–16 (2005). Springer
[7] Rybczynska, M.: Bounded loops in bpf for the 5.3 kernel (2019)
[10] Maguire, A.: Notes on BPF (1)—A Tour of Program Types (2019)
[11] Bachl, M., Fabini, J., Zseby, T.: A flow-based ids using machine learning in ebpf.
arXiv preprint arXiv:2102.09980 (2021)
[12] Wang, S.-Y., Chang, J.-C.: Design and implementation of an intrusion detec-
tion system by using extended bpf in the linux kernel. Journal of Network and
Computer Applications 198, 103283 (2022)
[13] Bosshart, P., Daly, D., Gibb, G., Izzard, M., McKeown, N., Rexford, J.,
Schlesinger, C., Talayco, D., Vahdat, A., Varghese, G., et al.: P4: Programming
protocol-independent packet processors. ACM SIGCOMM Computer Communi-
cation Review 44(3), 87–95 (2014)
[14] Roesch, M.: Snort users manual. http://www. snort. org (2002)
[15] Leblond, É., Manev, P.: Introduction to eBPF and XDP support in Suricata
(2019)
[16] Baidya, S., Chen, Y., Levorato, M.: ebpf-based content and computation-aware
communication for real-time edge computing. In: IEEE INFOCOM 2018-IEEE
Conference on Computer Communications Workshops (INFOCOM WKSHPS),
pp. 865–870 (2018). IEEE
[17] Findlay, W.: Extended berkeley packet filter for intrusion detection implementa-
tions. PhD thesis, Honours Thesis Proposal, Carleton University (2019)
[18] Xhonneux, M., Duchene, F., Bonaventure, O.: Leveraging ebpf for programmable
23
network functions with ipv6 segment routing. In: Proceedings of the 14th Interna-
tional Conference on Emerging Networking EXperiments and Technologies, pp.
67–72 (2018)
[19] Viljoen, N., Kicinski, J.: Using ebpf as an abstraction for switching. URL
http://vger. kernel. org/lpc net2018 talks/eBPF For Switches. pdf (2018)
[20] Miano, S., Bertrone, M., Risso, F., Bernal, M.V., Lu, Y., Pi, J.: Securing linux
with a faster and scalable iptables. ACM SIGCOMM Computer Communication
Review 49(3), 2–17 (2019)
[22] Lazri, K., Blin, A., Sopena, J., Muller, G.: Toward an in-kernel high performance
key-value store implementation. In: 2019 38th Symposium on Reliable Distributed
Systems (SRDS), pp. 268–2680 (2019). IEEE
[24] Miano, S., Doriguzzi-Corin, R., Risso, F., Siracusa, D., Sommese, R., CREATE-
NET, F.B.K.: High-performance server-based ddos mitigation through pro-
grammable data planes
[25] Ahmed, Z., Alizai, M.H., Syed, A.A.: Inkev: In-kernel distributed network vir-
tualization for dcn. ACM SIGCOMM Computer Communication Review 46(3),
1–6 (2018)
[26] Miano, S., Bertrone, M., Risso, F., Tumolo, M., Bernal, M.V.: Creating complex
network services with ebpf: Experience and lessons learned. In: 2018 IEEE 19th
International Conference on High Performance Switching and Routing (HPSR),
pp. 1–8 (2018). IEEE
[27] Scholz, D., Raumer, D., Emmerich, P., Kurtz, A., Lesiak, K., Carle, G.: Perfor-
mance implications of packet filtering with linux ebpf. In: 2018 30th International
Teletraffic Congress (ITC 30), vol. 1, pp. 209–217 (2018). IEEE
[28] Parola, F., Risso, F., Miano, S.: Providing telco-oriented network services with
ebpf: the case for a 5g mobile gateway. In: 2021 IEEE 7th International Conference
on Network Softwarization (NetSoft), pp. 221–225 (2021). IEEE
[29] Hong, J., Jeong, S., Yoo, J.-H., Hong, J.W.-K.: Design and implementation of
ebpf-based virtual tap for inter-vm traffic monitoring. In: 2018 14th International
Conference on Network and Service Management (CNSM), pp. 402–407 (2018).
IEEE
[30] Miano, S., Risso, F., Bernal, M.V., Bertrone, M., Lu, Y.: A framework for ebpf-
based network functions in an era of microservices. IEEE Transactions on Network
24
and Service Management 18(1), 133–151 (2021)
[31] Hohlfeld, O., Krude, J., Reelfs, J.H., Rüth, J., Wehrle, K.: Demystifying the
performance of xdp bpf. In: 2019 IEEE Conference on Network Softwarization
(NetSoft), pp. 208–212 (2019). IEEE
[32] Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Communications of
the ACM 20(10), 762–772 (1977)
[33] Aho, A., Corasick, M.: Effcient string matching. Comm. ACM 18(6), 333
[34] Ben-Yair, I., Rogovoy, P., Zaidenberg, N.: Ai & ebpf based performance anomaly
detection system. In: Proceedings of the 12th ACM International Conference on
Systems and Storage, pp. 180–180 (2019)
[35] Demoulin, H.M., Pedisich, I., Vasilakis, N., Liu, V., Loo, B.T., Phan, L.T.X.:
Detecting asymmetric application-layer denial-of-service attacks in-flight with
finelame. In: USENIX Annual Technical Conference, pp. 693–708 (2019)
[36] Wieren, H.: Signature-based ddos attack mitigation: Automated generating rules
for extended berkeley packet filter and express data path. Master’s thesis,
University of Twente (2019)
[37] Choe, Y., Shin, J.-S., Lee, S., Kim, J.: ebpf/xdp based network traffic visu-
alization and dos mitigation for intelligent service protection. In: Advances in
Internet, Data and Web Technologies: The 8th International Conference on
Emerging Internet, Data and Web Technologies (EIDWT-2020), pp. 458–468
(2020). Springer
[38] Panigrahi, R., Borah, S.: A detailed analysis of cicids2017 dataset for designing
intrusion detection systems. International Journal of Engineering & Technology
7(3.24), 479–482 (2018)
25