0% found this document useful (0 votes)
22 views

Anomaly Detection

Anomaly detection Research paper

Uploaded by

mulges7
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Anomaly Detection

Anomaly detection Research paper

Uploaded by

mulges7
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Computer Communications 202 (2023) 191–203

Contents lists available at ScienceDirect

Computer Communications
journal homepage: www.elsevier.com/locate/comcom

Anomaly detection for fault detection in wireless community networks using


machine learning
Llorenç Cerdà-Alabern a ,∗, Gabriel Iuhasz b , Gabriele Gemmi a,c
a Universitat Politècnica de Catalunya, Barcelona, Spain
b
West University, Timisoara, Romania
c
University of Venice Ca’ Foscari, Italy

ARTICLE INFO ABSTRACT


Keywords: Machine learning has received increasing attention in computer science in recent years and many types of
Fault detection methods have been proposed. In computer networks, little attention has been paid to the use of ML for fault
Anomaly detection detection, the main reason being the lack of datasets. This is motivated by the reluctance of network operators
Machine learning
to share data about their infrastructure and network failures. In this paper, we attempt to fill this gap using
Wireless network dataset
anomaly detection techniques to discern hardware failure events in wireless community networks. For this
Wireless community networks
purpose we use 4 unsupervised machine learning, ML, approaches based on different principles. We have built
a dataset from a production wireless community network, gathering traffic and non-traffic features, e.g. CPU
and memory. For the numerical analysis we investigated the ability of the different ML approaches to detect
an unprovoked gateway failure that occurred during data collection. Our numerical results show that all the
tested approaches improve to detect the gateway failure when non-traffic features are also considered. We
see that, when properly tuned, all ML methods are effective to detect the failure. Nonetheless, using decision
boundaries and other analysis techniques we observe significant different behavior among the ML methods.

1. Introduction in computer networks where AD can also be of great interest, but which
has received little attention. Some difficulties explain the few works
Anomaly detection (AD) is aimed at identifying deviations from that can be found in the literature on AD for fault detection. The main
the expected behavior [1]. The identification of these unusual behav- one is the lack of datasets. The reason is that a realistic dataset for
iors has proven to be a powerful tool in a wide range of applied
fault detection would consist of features related to network traffic and
sciences. Some examples are credit-card fraud detection, medical di-
hardware metrics, as CPU load and memory usage. This kind of dataset
agnosis, industrial processes, and computer networks [2]. Although
would be difficult to generate by simulation, but should be obtained
different methods can be applied to AD [3], recent advances in ML
methods and their ability to learn from data have boosted the number from a real testbed or, ideally, from a production network. However,
of AD proposals using ML techniques [4]. for confidentiality reasons commercial network operators do not make
Machine learning uses a dataset to make probabilistic these types of datasets of their networks public.
predictions [5,6]. A group of dataset samples, referred to as training Another goal of our work is therefore to create a dataset with real
set, is used to train the algorithm. Then, predictions are made over data. To do so we will focus on AD in Wireless Community Network
a different group of samples referred to as the testing set. In some (WCN). Community networking, also known as bottom-up networking,
applications the training set is labeled. For instance, in optical character is a model that emerged by the need for broad Internet access in
recognition the training set consists of images and the labels are under-provisioned zones, typically rural areas and developing coun-
their corresponding characters. These problems are called supervised
tries. Nowadays there are hundreds of community networks operating
learning. In contrast, in AD an unlabeled dataset is usually used, and ML
in very diverse ways [9]. WCNs are non-profit networks built by their
algorithms are called unsupervised. In AD the training set is simply used
as a representation of the expected normal operation, and anomalies own users, normally installing wireless antennas on top of their houses.
are identified by deviations from the expected behavior. Sometimes WCNs users do not only build the network infrastructure,
In the field of computer networks, AD has been primarily concerned but associate to constitute micro ISPs, providing access to the Internet
with security, especially network intrusion detection [3,7,8]. In con- and internal services managed by the community. WCNs are similar
trast, in this paper we focus on fault detection, another essential need to Wireless Internet Service Providers (WISPs) [10] — typically small

∗ Corresponding author.
E-mail address: [email protected] (L. Cerdà-Alabern).

https://doi.org/10.1016/j.comcom.2023.02.019
Received 20 September 2022; Received in revised form 18 January 2023; Accepted 20 February 2023
Available online 23 February 2023
0140-3664/© 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/by/4.0/).
L. Cerdà-Alabern, G. Iuhasz and G. Gemmi Computer Communications 202 (2023) 191–203

Fig. 1. GuifiSants topology. Colors indicate links configured in the same WiFi channel, and outdoor routers.

businesses with reduced margin profits and the aim to provide broad repository [19]. To our best knowledge, this is the first dataset
Internet to their community. containing such a rich diversity of features obtained from a
The non-profit and shared resources of community networks typ- production wireless network available in a public repository.
ically make their users open to participate in research projects, and • We have used AD methods using the dataset to perform a fault
provide access to data from their network infrastructure. We have thus detection analysis using 4 ML unsupervised learning approaches
focused on WCN with the objective of building a dataset of a production based on different principles. We investigate the performance of
network. In particular we have considered Guifi.net. Guifi.net is one these well-known ML methods applied to AD in wireless networks
outstanding WCN example. It started in 2004 and, at the time of and their behavior by varying some system parameters, such as
writing, reports 36.886 working nodes [11]. Guifi.net has become a the size of the dataset.
complex network where associations of self-providing users coexist
with commercial network operators [12]. The rest of the paper is organized as follows: Section 2 provides
In this paper, we analyze a dataset we have gathered from a produc- some related works. Section 3 summarizes the 4 ML approaches used in
tion WCN which is part of Guifi.net. This WCN has been deployed in this paper. Section 3 describes the methodology and features gathered
a quarter of Barcelona, Spain, and it is called GuifiSants [13]. When in the dataset under study. Section 5 provides the numerical results and
the dataset was gathered there were around 60 nodes. GuifiSants is Section 6 concludes the paper.
highly heterogeneous in its topology, quality of the links, and node
behavior. There are nodes located at higher buildings having several 2. Related work
antennas, some sector and other parabolic creating high capacity point-
to-point links. These nodes form a kind of unplanned backbone. Other
Anomaly detection refers to the problem of finding patterns in data
non-technical users have end nodes made of a single antenna. Fig. 1
that do not conform to expected behavior [1]. In statistics, the term
shows the geographic locations of the GuifiSants nodes, and a picture
outlier is often used for this concept. The importance of AD is due to
of the outdoor routers deployed in one node. Depending on the antenna
the fact that anomalies in data might reveal critical misbehavior in a
types, obstruction of the links, and 802.11n/ac WiFi technologies used,
the link capacity varies from a few Mbps to several hundreds. The wide number of scenarios. For instance, fraud detection for credit cards,
stability of the nodes is also rather variable. For instance, there are insurance or health care, damage detection, etc. The significance of
users that frequently update the software, or reboot their node if the this topic has motivated a large number of articles, surveys, and even
connection is not working satisfactorily. Backbone nodes are more books. For instance, the surveys [1,20] provide an extensive review of
stable and have few failures or reboots. Being a mesh network, the detection techniques developed in a wide number of domains.
connectivity is rather resilient: even if a backbone node fails, the In computer networks, most work on AD has traditionally focused
routing protocol re-configures alternative routes and most nodes can on the problem of security. In this context the term Intrusion Detection
still reach the Internet, which is the main service consumed by the System (IDS) is often used. IDSs typically exploit anomalies in network
users. AD in such a WCN is challenging due to its heterogeneous and traffic to detect a wide number of security attacks, such as denial of
diverse nature. Nevertheless, we believe it is not only an interesting service (DoS), reconnaissance, etc. See e.g. [21] for a description of
study in its own right, but other scenarios as large IoT networks may different types of security attacks. An overview of IDSs proposed in
share similar diversity features, allowing us to extrapolate some of the the literature can be found in [7]. In [22], PCA (Principal Component
lessons learned from WCN analysis. Analysis) was introduced to perform a structural analysis of network
Most AD studies related to computer networks found in the liter- traffic. Their proposal received significant attention for IDS. For in-
ature use traffic-related features [14–16]. In contrast, the dataset we stance, in [23,24] refinements to PCA analysis are done to detect spikes
have produced contains not only traffic, but features from the Linux in traffic flows, which potentially identify network attacks. In [25] gen-
kernel that reflect e.g. the state of the CPU and memory. Such features erate port-scans and snapshots attacks in a lab, merged with residential
are typically gathered by network monitoring tools such as Munin [17] traffic. Attacks detection is done using a robust PCA approach. PCA to
and Nagios [18]. These tools, however, simply produce time-series detect network traffic anomalies has also been criticized by a number of
graphics of the gathered features, and their analysis and interpretation papers [26]. In [27] it is analyzed the weaknesses of PCA, and it is used
are left to the network administrator. We are thus interested in whether Commute Distance to detect DoS attacks using traffic measures. In [14]
including these additional features to the basic ones based on traffic can
it is argued that problems with PCA are due to flaws in its adoption,
increase the AD capability.
and appropriate ways to apply PCA are proposed.
Summing up, our main contributions in this paper are the following:
ML methods have a long tradition of being used for AD. Until
• We have produced a dataset gathering a large set of features recently most research has been done on using supervised ML methods
related not only to traffic, but other parameters such as CPU coupled with synthetic or heavily annotated datasets [28]. Recently
and memory. The dataset has been gathered from a produc- this trend has shifted towards utilizing real-life datasets coupled with
tion WCN and we have made it available in the public Zenodo unsupervised methods. Even so, there are not many research articles

192
L. Cerdà-Alabern, G. Iuhasz and G. Gemmi Computer Communications 202 (2023) 191–203

that have both a comparison of unsupervised AD methods coupled with are in many cases prone to a high false-positive rate. This can be due to
a freely available real-life dataset. many factors such as overfitting, large feature spaces, etc. Second, even
A very good overview of unsupervised methods on several datasets if anomalies are detected the algorithms themselves usually are unable
is given in [29]. Here we see the performance of several well-known AD to give any meaningful insight into what type of anomaly has been
methods such as k-NN, CBLOF, etc. More relatable research including detected or even what caused the anomalous event. In this section of
some of the same methods has been done in [30] where authors our paper, we will focus on alleviating some of the issues presented here
compare several clustering-based AD techniques on network traffic in the case of unsupervised detection methods. Note that we shall refer
data. Other types of AD methods have also been used in network as sample each measurement of the dataset. Thus, we shall use the terms
management systems with high dimensional network data [31]. anomaly event, anomaly sample or simply anomaly, interchangeably.
Deep learning based methods are also utilized for AD. In [32] the
authors provide a comprehensive overview of different deep learning 3.1. Principal component analysis
methods for AD including unsupervised methods. They also correctly
point out that explainability is a key issue, especially when using AD Principal Component Analysis, PCA, is a standard methodology
methods which are traditionally considered as black-box models. The applied to Statistical Process Control [14]. The idea of PCA is approxi-
problem of explainability is also discussed in [33,34] in the case of mating the samples of 𝑛 features by a projection on a lower dimensional
different deep learning models, which include Autoencoders [35] and space of dimension 𝑙 < 𝑛. PCA is well studied in the literature, so we
Variational-Autoencoders [36] respectively. omit the description of this method and only discuss the statistics we
Most of the research done regarding unsupervised methods for AD is used for our analysis. The interested reader can refer to [14] for further
largely focused on network intrusion detection. In [37] there is one of details.
the few works found in the literature where network AD is investigated In order to detect anomalies of a new score there are several
using ML with a real dataset. However, in [37] the authors use AD indices [14,49]. We have found that in our study the best performing
to study the behavior of TCP traces to investigate the performance one is the squared prediction error (SPE), also known as the Q statistic:
of a 4G cellular network, which is quite a different scenario than

𝑛
ours. The methods used in the literature can be divided into several 𝑄𝑗 = 𝑒2𝑗𝑖 (1)
types. These include clustering based methods [38–41], outlier based 𝑖=1
techniques [37,42–45] and soft computing based techniques [46–48]. where 𝑒𝑗𝑖 the features’ 𝑖th residual of sample 𝑗.
All of the methods described here have both pros and cons in addition Once an anomaly is detected, a diagnosis system is needed to
to the problem of explainability already mentioned before. Clustering determine its root causes. The general approach is using contribu-
based methods usually can handle only continuous features, inappro- tion plots [50]. The idea of such plots is estimating the contribution
priate proximity measures adversely affect the detection rate. Most of each observed feature of a sample to a particular statistic value
outlier based methods are complex (typically used in conjunction with considered anomalous. There have been proposed several methods to
clustering) and are highly parameter dependent, while soft computing build contribution plots [49]. In this paper, we shall use complete
based methods usually require large quantities of historical data with decomposition contributions (CDC). CDC are easily computed and show
overfitting of models being arguably a greater issue than for supervised a good diagnosis performance in a networking monitoring system [14].
methods. In essence the selection of which type of method and what The CDC for the Q statistics defined above is given by [49]:
training parameters to use is very much dependent on the problem
𝑄𝑗
domain and the available data. 𝐶𝐷𝐶𝑖 = 𝑒2𝑗𝑖 . (2)

3. Machine learning based approaches 3.2. Isolation Forest

In the following section, we will focus on testing several unsuper- In recent years, ensemble based detectors have garnered popularity
vised ML techniques for AD. For ML techniques AD is split up into in the case of AD. These methods combine the outputs of multiple
several categories based on the type of methods and the characteris- algorithms called base detectors which are then used to create a uni-
tics of the available data. The simplest form of anomalies are point fied output. These algorithms operate on the assumption that some
anomalies which can be characterized by only one feature making algorithms perform well on a subset of the available data while others
them easier to detect. Other types of anomalies are more complex but can perform better on a different subset. Ensemble combination on the
ultimately yield a much deeper understanding of the inner workings of other hand often outperforms individual estimators because of their
a monitored system and/or application [1]. These types of anomalies ability to combine the outputs of multiple algorithms. Most ensemble
are fairly common in complex geographically distributed systems. methods require labeled data however, it has been shown both from
Contextual anomalies are extremely interesting in the case of com- a theoretical [51] and practical [52] point of view that unsupervised
plex systems. These types of anomalies happen when a certain pattern ensemble methods have characteristics with their supervised counter-
of feature values is encountered. In isolation, these values are not parts. We can formulate a modified bias–variance trade-off for the
anomalous but when viewed in context they represent an anomalous anomaly analysis setting [51], this will enable many of the supervised
event. These types of anomalies can represent application bottlenecks, algorithms to be ‘‘generalized’’ into unsupervised tasks. One of the
imminent hardware failure, software misconfiguration, or even mali- key issues which is caused as a direct result of this generalization is
cious activity. The last major types of anomalies, which are relevant, the increased difficulty in selecting appropriate hyper-parameters. We
are temporal and sequential anomalies where a certain event takes should also note that ensemble models are useful when dealing with
place out of order or at the incorrect times. These types of anomalies difficult algorithmic choices by reducing the uncertainty inherent in
are very important in systems that have a strong spatial–temporal these.
relationship between features, which is very much the case in dynamic Isolation Forest is an outlier ensemble based algorithm which is
distributed systems such as mesh networks. constructed from multiple isolation trees [53]. It explores random
One important consideration for most AD tasks is the fact that subspaces from the data. In essence, it explores random local subspaces
labeled high-quality data is rarely available. Moreover, supervised as each tree uses different splits. Scoring is done by qualifying how easy
methods trained on labeled data will not be able to detect new or it is to find a local subspace of low dimensionality in which a particular
unforeseen anomalous events. Unsupervised methods do not have this event is isolated [54]. In other words distance from the leaf to the root
limitation, however, these methods have several downsides. First, they is used as the outlier score. Similar to Random Forest a supervised

193
L. Cerdà-Alabern, G. Iuhasz and G. Gemmi Computer Communications 202 (2023) 191–203

method, the final score is obtained by averaging the path length of 3.4. Variational AutoEncoders
any particular data point in different isolation trees. In most scenarios,
Isolation Forest works under the assumption that it is more likely Variational AutoEncoders (VAE) are deep neural network mod-
to detect or isolate an outlier in a subspace of lower dimensionality els designed for unsupervised training, which can be used for AD
created by the random splits. tasks [57,58]. They are often mentioned together with Autoencoders
During the training phase, Isolation Forest constructs the unsuper- (AE) which are also deep learning models with seemingly similar
vised equivalent of decision trees. These trees are binary and have at topological components: encoder and decoder. The encoder tries to
most 𝑁 leaf nodes which is equal to the number of data points in the learn a lower-dimensional representation of the input data (similar to
training set. Thus the root node contains all of the data points to be PCA) and a decoder that attempts to reproduce the input data in the
processed and serves as the initial state of the isolation tree 𝑇 . Next, a original dimension (AE are usually symmetrical). AEs try to encode the
candidate list 𝐶 is initialized containing the root node. From this list, a data in such a way that they reduce the reconstruction error. When
node denoted as 𝑅 is randomly chosen. A randomly chosen attribute 𝑖
used for AD AEs reconstruction error can be used as a form of anomaly
is used to split 𝑅 into two sets 𝑅1 and 𝑅2 . This is done by selecting
score [59]. Provided the AE has sufficient training data to provide a
a random value 𝑎 from attribute 𝑖 such that all data points from 𝑅1
minimal reconstruction error for normal data.
satisfy 𝑥𝑖 ≤ 𝑎 and all data points in 𝑅2 satisfy 𝑥𝑖 > 𝑎. This value 𝑎
VAEs on the other hand try to encode the data into a multivariate
is chosen uniformly at random between the minimum and maximum
latent distribution which is then used by the decoder. The difference
values of the 𝑖𝑡ℎ attribute among the data points in node 𝑅. Both 𝑅1
and 𝑅2 are children of 𝑅 in 𝑇 . In case 𝑅𝑖 contains more than one point between AE and VAE is that instead of generating a latent vector, which
then we add it in 𝐶 otherwise we designate the node as a leaf. We the decoder can reproduce, VAEs learn two vectors that represent the
repeat this process until 𝐶 is empty. mean and variance parameters of a distribution from which the latent
We can see that the result of training will yield a non-balanced vector is sampled and used by the decoder function for reconstructing
binary tree and that outlier nodes will typically be isolated at a lower the original input. In general latent space generated by VAEs tend
dimensionality than normal points. The inherently randomized ap- towards normality due, in no small part, to the fact that encoders are
proach is repeated multiple times and the results are averaged (en- heavily regularized (via Kullback–Leibler divergence term).
sembled) with a computational complexity of 𝛩(𝑁 log(𝑁)) and space VAEs have some issues in common with other deep learning models.
complexity of 𝑂(𝑁). Isolation trees create hierarchical clusters from First, it is relatively slow to train, often requiring specialized hardware
the data, however, they are extremely randomized as can be seen from such as GPGPU. Large datasets also require complex network topolo-
the previous paragraphs. This leads to some issues when using this gies. However, if complex topologies are applied to small datasets
method on real-world datasets. By using subsampling it is possible this can lead to overfitting. One regularization method which has
to improve computational performance and aid in creating diversity been shown to work in reducing overfitting is the addition of dropout
which is crucial for ensemble methods [52]. However, diversity is also layer [60].
gained through the randomized process of isolation tree construction, A variant of VAE called 𝛽 − 𝑉 𝐴𝐸 has been proposed for anomaly
arguably even more so than subsampling. Thus, the correct selection detection [61,62]. This variant ads a new hyper-parameter called 𝛽
of features or attribute 𝑖 is very heavily influenced by randomness. It which constricts the encoding capacity of the latent bottleneck and
is entirely possible to have poor-performing models because of poor encourages latent representations to be factorized. It should be noted
feature selection during training. that the standard VAE can be considered a special case of this new
variant, the two being equivalent when 𝛽 = 1.
3.3. Clustering-based local outlier factor
3.5. Feature selection
There are several underlying assumptions that stand at the basis of
unsupervised methods. Proximity-based techniques define an anoma-
High dimensional unbalanced datasets are one of the main chal-
lous event (or data point) when its locality is sparsely populated. This
proximity can be defined in several ways. In cluster-based methods non- lenges of ML research field [63]. By using methods that enable the
membership of an event relative to a cluster is its distance from other manual or automatic choosing of a relevant subset from an initially
clusters, the size of the closest cluster, or a combination of these factors, large and potentially redundant set of initial features we can potentially
are used to calculate an anomaly score. Some methods are heavily improve both the accuracy as well as the computational overhead of
cluster based, basically anything which cannot be assigned to a cluster ML methods. A dataset that has a lower dimensionality and is well
is considered an anomaly. balanced can also substantially limit overfitting and provide an easier
Distance-based methods base their anomaly score around the dis- interpretation of prediction results [64]. In the case of unsupervised
tance of an event to its k-nearest neighbors. In this case events with feature selection, one cannot rely on target variables but instead, utilize
high k-nearest neighbors distances as anomalous. Distance-based meth- some other target criteria and clustering to improve detection accuracy.
ods usually have a greater granularity which comes with an increased The Kurtosis measure can be calculated on each feature as a form
computational cost. Density-based methods work based on the number of unsupervised feature selection method. The Kurtosis of a feature 𝑓𝑖 ,
of events that are in a local region, this is then used to define local 𝑖 = 1, … , 𝑛 of 𝑚 samples is computed by first standardizing its samples
density and compute the anomaly score. Basically, density-based meth- with zero mean and unit standard deviation:
ods partition data space while clustering-based methods partition data 𝑓𝑖𝑗 − 𝜇𝑖
points. 𝑘𝑗 = , 𝑗 = 1, … , 𝑚 (3)
𝜎𝑖
Clustering-Based Local Outlier Factor (CBLOF) [55] is a proximity
based algorithm, which is a combination of local outlier factor (LOF) Where 𝜇𝑖 represents the mean and 𝜎𝑖 the standard deviation of 𝑓𝑖 ,
and a clustering technique [56]. LOF adjusts the anomaly (or outlier) and 𝑓𝑖𝑗 the 𝑗th sample of 𝑖th feature. Next, the Kurtosis of 𝑓𝑖 is
score based on local density. The density is defined as an inverse computed as:
of average distances. This approach results in events in local regions ∑𝑚 4
𝑗=1 𝑘𝑗
of high density being given higher anomaly scores even if they are 𝐾𝑖 = (4)
isolated from the other events in their locality. This is mainly due to the 𝑚
definition of density which does not contain the number of anomalies Kurtosis is the measure of ‘‘non-uniformity’’ of the data. It describes
in a cluster. In fact, CBLOF is a score in which anomalies are defined the shape of the probability distribution. A high Kurtosis value usually
as a combination of local distance to nearby clusters and the size of the corresponds to a high degree of deviation/outliers although the inter-
clusters to which each event belongs. Thus events in small clusters that pretation is variable based on the context it is applied on. In our case,
are at large distances to nearby clusters are flagged as anomalies. the Kurtosis level calculated for each feature allows us to select only

194
L. Cerdà-Alabern, G. Iuhasz and G. Gemmi Computer Communications 202 (2023) 191–203

those features which have a relatively high non-uniform distribution. some tens of users having this network as their only access to the
A known downside of the Kurtosis measure is that it does not consider Internet. In [71] a live monitoring web page of GuifiSants updated
interactions between various features. Multidimensional Kurtosis can hourly is available. Technical analysis of GuifiSants can be found
be computed by computing the Kurtosis on the Mahalanobis distance in [72].
of all data points to the centroid of the reprojection of the original data The dataset has been built from data samples gathered from each
into a lower-dimensional subspace. node every 5-minutes. The dataset is publicly available in Zenodo [19].
Mean Absolute difference computes the absolute difference from the This is done using a permanent ssh connection from one central moni-
mean value. A higher score denotes a greater discriminatory poten- toring server to each node in the mesh, which is used to run standard
tial [65]. It is defined as: system commands. The dump of these commands is then parsed to
obtain the data. This method has the advantage that no changes or ad-
1 ∑
𝑚
𝑀𝐴𝐷𝑖 = |𝑓 − 𝜇𝑖 | (5) ditional software need to be installed in the nodes. This is an important
𝑚 𝑗=1 𝑖𝑗 condition since the users are the owner of their nodes. Therefore, only
the users’ permission is needed to install a public key to access the node
3.5.1. Shapely values with ssh for monitoring purposes.
The above-mentioned methods are a useful way of identifying fea- The data is obtained by reading the Linux kernel variables available
tures that might yield a better-performing predictive model. One inter- through the /proc file-system. For instance, /proc/net/dev to
esting new approach, not only to select features with a high degree of read counters with the number of bytes and packets transmitted and
impact on a prediction but also to explain, in the case of unsupervised received over each interface; /proc/stat where there is informa-
methods, why a particular event has been deemed as anomalous is tion about kernel activity; /proc/meminfo for memory usage, etc.
Shapely values [66]. These have been first introduced in the study Kernel variables are of two types: (i) absolute values, for instance, the
of coalition games. It is defined on a value function denoted as 𝑣 of CPU 1-minute load average, and (ii) counters that are monotonically
players in 𝑆. The Shapely value denotes the contribution to the payout increased, for instance, the number of transmitted packets. We have
of a particular feature value, weighted and summed over all possible converted the counter-type kernel variables into rates by dividing the
combinations: difference between two consecutive samples over the difference of the
∑ |𝑆|! (𝑛 − |𝑆| − 1)! corresponding timestamps in seconds. We have eliminated the negative
𝜙𝑖 (𝑣) = (𝑣(𝑆 ∪ {𝑖}) − 𝑣(𝑆)) (6)
𝑛! rate samples that occur when a node reboots, or when a counter reaches
𝑆⊆{1,…,𝑛}⧵{𝑖}
its maximum value and restarts.
where 𝑛 denotes the set of all players thus the Shapely value of the The dataset consists of traffic and non-traffic features. Traffic fea-
game (𝑣, 𝑛) is used to distribute the total gain 𝑣(𝑛) to each player in tures are obtained from the counters available at /proc/net/dev
accordance with each contribution [33]. In the context of machine Linux /proc file-system. For each node, we have considered the coun-
learning in general and AD in particular a player 𝑖 corresponds to a ters of received and transmitted bytes and packets over Ethernet and
feature from the dataset, this way 𝑛 denotes the total number of input WiFi interfaces.
features. Conversely, the Shapely value of a feature 𝑖 ∈ 𝑛, 𝜙𝑖 (𝑣) is the For each node, we have also considered the sum of counter val-
weighted average of the marginal contribution. Resulting from Eq. (6) ues for both types of interfaces (Ethernet and WiFi), and the sum
we can compute the prediction for feature values in set 𝑆, a subset and difference of received and transmitted bytes and packets over
of the features used to train the model, which are marginalized over all interfaces. Recall that all counter features are then converted into
features that are not in set 𝑆 as follows [67]: rates, as explained above. For instance, features eth.txb.rate-24
and sum.xb.rate-24 refer to the rates of transmitted bytes over
𝑣𝑥 (𝑆) = 𝑓 (𝑥1 , … , 𝑥𝑛 ) 𝑑𝑃 (𝑥 ∉ 𝑆) − E𝑋 [𝑓 (𝑋)] (7) Ethernet interfaces, and total number of received and transmitted bytes,

respectively, in node 24 (see Fig. 2).
In the context of ML models 𝑋 is the vector of values from the
Non-traffic features include the number of processes, average CPU
features of the instance to be explained and 𝑛 the number of features.
load, softirq, iowait, process execution time in kernel and user mode,
Shapely values are symmetrical in the sense that equal contribution
context switches, etc. For the sake of brevity, we do not list here
results in equal Shapely values and non-contributing features have a
all the features collected in the dataset. The complete list of features
shapely value of 0. It also has the efficiency property, meaning that
with a brief description can be found in the public repository of the
feature contributions sum up to the difference of prediction for 𝑥 and
dataset [19].
the average prediction.
Overall there were 63 nodes gathered in the dataset, with a total
The main issue for computing Shapely values for feature selection is of 2387 features. To evaluate the effectiveness of ML methods to detect
that they require a labeled dataset or prediction to explain. In our case, anomalies we proceeded as follows. On April 14, 2021, one of the two
we have unsupervised methods we wish first to have an explanation of mesh gateways (node 24) failed and was replaced. Due to the node
why a particular event has been identified as anomalous. This use-case failure, samples from this node were missing between 01:55 and 17:40
fits very well with Shapely Values. However, we can also compute the of April the 14th. For the testing set, we used the samples gathered
feature importance based on the anomalous events detected and the during a 3 day interval when the failure occurred: April the 13, 14 and
Shapely values calculated. This step can be used to reduce the dataset 15th (694 samples). For the training set, we used the samples gathered
feature space considerably while at the same time it has the potential during the 4 weeks prior to the testing set (7237 samples). Fig. 2 shows
to increase the anomalous event detection rate. the sum.xb.rate-24 traffic feature of node 24 during the training
and testing set.
4. Dataset
5. Experiments and results
In this paper, we will focus on a WCN called GuifiSants [13].
GuifiSants started in 2009 and is part of Guifi.net. The nodes of In this section, we will detail our experiments utilizing unsupervised
GuifiSants consist of antennas flashed with a Linux Openwrt distribu- AD methods. All experiments were run on an IBM Power SC9221 server
tion [68] running the BMX6 mesh routing protocol [69,70]. GuifiSants with 160 Power9 CPUs clocked at 3.7 GHz, 644 GB of RAM, and 4
is a WCN deployed in a neighborhood of the city of Barcelona (Spain) Nvidia V1002 32 GB GDDR5 GPUs with NVLlink.
called Sants. In 2012 Universitat Politècnica de Catalunya (UPC) joined
GuifiSants for research purposes. At the time of writing, there are 1
https://www.ibm.com/downloads/cas/KQ4BOJ3N
2
around 60 nodes in GuifiSants. GuifiSants is a production network with https://www.nvidia.com/en-us/data-center/v100/

195
L. Cerdà-Alabern, G. Iuhasz and G. Gemmi Computer Communications 202 (2023) 191–203

Fig. 2. Plot of the sum.xb.rate-24 traffic feature (sum of received and transmitted traffic over node 24) during the training and testing set (left and right of the solid lined,
respectively). Dashed lines show the gateway failure interval. Peaks and valleys show daily activity and inactivity periods. Dates in the 𝑥-axis are formatted as day–month hour.

We have used the 4 ML methods presented in Section 3. These Table 1


Anomaly detection method hyper-parameters.
methods were chosen because they are radically different approaches
for AD. The experiments were run in python using the libraries scikit- ML method Parameters Value

learn [73] for PCA and pyod [74] for Isolation Forest, CBLOF, and VAE. PCA PC number 45
The experiments follow a well-established set of steps. First, the raw n_estimators 20
Isolation
data is formatted, cleaned, and normalized. Next, we run several ML max_samples 0.7
Forest
max_features 1.0
based detection methods to find the best performing hyper-parameters.
Feature selection methods are then applied to improve these initial n_clusters 8
CBLOF alpha 0.9
results. Finally, we run a second set of experiments with the new subset
beta 5
of the original data and analyze the final outcome. The python scripts
encoder_neurons [128, 64, 32]
used to process the datasets can be found in the public repository decoder_neurons [32, 64, 128]
https://github.com/llorenc/ml-comcom. VAE epochs 30
As with all ML tasks, data understanding and cleaning is one of the dropout_rate 0.2
most time-consuming and crucial steps. In our experiments, we prune activation ReLU
those features which had a low variance in the selected time frame. Low
variance features have been identified globally, encompassing all nodes
from the mesh network. Training data is comprised of 7237 samples given by the 1 − 𝛼 quantile of the training set. For our training set of
each with 1585 features after pruning, while the testing set is 694 7237 samples this contamination factor yields ⌈𝛼×7237⌉ = 37 anomalies.
samples with the same feature space as the training set. Note that over the 694 samples of testing set, under normal conditions
Data Normalization can seriously degrade the performance of ML we would foresee having ⌈𝛼 × 694⌉ = 4 anomalies. However, due to the
methods when incorrectly applied to imbalanced datasets. It is of gateway failure that occurs during the testing interval, we expect the
the utmost importance to normalize discrepancies within the dataset different ML methods to detect a larger number of anomalies.
features. Insignificant as well as dominant values should be within an For each of these algorithms, we had to tune the hyper-parameters
acceptable range [75]. We have chosen the min–max technique which manually so that anomalous events are grouped inside the time-frame
normalizes the data in the [0, 1] range based on the following: we know anomalous events occurred. Table 1 shows the
𝑋 − 𝑋𝑚𝑖𝑛 hyper-parameters used for each ML detection method. These values
𝑋𝑛 = (8)
𝑋𝑚𝑎𝑥 − 𝑋𝑚𝑖𝑛 were fine-tuned using Random Search with a manually defined hyper-
The resulting normalized values denoted by 𝑋𝑛 have the benefit of parameter search space. In the case of PCA, the only parameter is the
preserving the relationships between features thus virtually eliminating number of principal components, PC. To assess the number of PC we
bias. Furthermore, in the case of some ML methods (ex. Deep learning have used the common method of choosing the number of dimensions
and Neural Network based solutions) the correct normalization of the that preserve 95% of residual variance. In the case of CBLOF, we used
data reduces the convergence rate. For PCA we have found it more the default values for all parameters as these provided the best result.
convenient to apply the min–max normalization only to non-traffic VAE parameters include a dropout rate which can help with model
features. Traffic features have been normalized by dividing each feature generalization and prevent overfitting. The shape of the encoder and
vector by the mean over the maximum of each traffic feature set. PCA decoder parts of VAE is also symmetrical. An important consideration
is very sensitive to the normalized feature ranges. Thus, doing this is that we set the batch size to 32. The hardware on which these
way we give more importance to traffic over non-traffic features, while experiments were executed can handle a substantially larger batch size,
maintaining the relative importance to nodes having higher traffic however, to keep it in line with what we could expect on more limited
values. hardware, such as Nvidia Jetson Nano3 board we used a much smaller
Most libraries for unsupervised anomaly and outlier detection meth- value. A similar approach has been taken when setting the number of
ods require an expected contamination factor to be defined for training. training epochs. We are aware that both batch size and training epochs
As anomalous events happen rarely and we wish to avoid false positives are important hyper-parameters, however, our ultimate aim is to see
as much as possible we set the contamination to a low value of 𝛼 =
3
0.005. Then, the threshold for a score to be considered anomalous is https://developer.nvidia.com/embedded/jetson-nano-developer-kit

196
L. Cerdà-Alabern, G. Iuhasz and G. Gemmi Computer Communications 202 (2023) 191–203

Table 2 samples gathered during the number of days indicated on the abscissa
Results of ML method based experiments.
axis. The training samples are taken from the day immediately prior to
ML method Num. of Training Inference Testing the testing set, up to all days of the previous 4 weeks. Fig. 3 shows quite
features [s] [s] anomalies
different performance in the different ML methods. While PCA, CBLOF,
(all) 1585 1.25 0.14 151/122
PCA and VAE generate a large number of anomalies, and thus false positives,
(only traffic) 880 0.88 0.11 13/11
when only the samples from a small number of days are considered,
Isolation (all) 1585 2.33 1.04 74/69
Isolation Forest only detects a small number of anomalies and thus has
Forest (only traffic) 880 0.97 0.37 10/4
a large number of false negatives. On the other hand, when using the
(all) 1585 18.28 2.71 125/115
CBLOF samples of more than 13 days, the anomalies detected by VAE remain
(only traffic) 880 3.42 0.09 3/2
almost the same, correctly detecting as anomalous all samples within
(all) 1585 58.61 1.09 134/122
VAE
(only traffic) 880 39.36 0.49 23/12
the gateway failure interval. The number of anomalies detected by the
other methods oscillates even after using the samples of more the 20
days for the training set. We can conclude that in our experiments VAE
is able to reliably detect anomalies with a smaller number of samples.
how the above-mentioned ML methods will behave on devices that Next, we investigated the ML methods by varying the number of
might be located in an Edge/Fog type scenario. We should note that, features. We chose from one to all features, ordered by significance,
because we lack a labeled dataset it is virtually impossible to apply and computed the number of anomalies. To compute the significance
normal hyper-parameter optimization techniques. Instead, we focus on of the feature we used the contribution plot of the PCA Q statistics
finding those hyper-parameters which result in a predictive model with (see Section 3.1), and the selection methods detailed in Section 3.5.
high stability and repeatability. In our case, we gauge this by looking To compute the Shapely values of the predictions we used the SHAP
for those models which consistently detect anomalies within our testing library [76] which uses several approximation methods to compute the
interval. We should note that algorithm hyper-parameters are not Shapely values. Isolation Forest was chosen to compute the Shapely
selected at random nor in isolation as they are, in many cases, linked values of the testing set predictions as it is well supported by the SHAP
with one another. Although, in our experiments we used unguided library and is relatively fast when it comes to training and inference
optimization techniques the hyper-parameter space for each algorithm times. As payout (prediction function) we used 0 and 1 for samples clas-
was defined after significant consideration. For example in case of IF a sified as normal and anomalous, respectively. Fig. 4 shows the number
low estimator count with a maximum number of features and samples of anomalies detected in the testing set, and how many of them fall
would lead to model overfitting. In our case for IF estimator count within the gateway failure interval, varying the number of features used
was defined in the range [10, 100], sample size [0.3, 1.0] and features for training. The anomalies computed with the ML methods listed in
on [0.2, 10.0]. Similarly, in case of VAE the number of epochs was set in each row are divided according to the type of feature selection method,
the range [5, 100] dropout in the range [0.1, 0.8]. A total of 3 activation shown at the top of each column. The number of features is ordered by
functions types where tested [‘𝑟𝑒𝑙𝑢’, ‘𝑒𝑙𝑢’, ‘𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑟𝑖𝑐_𝑟𝑒𝑙𝑢’]. Encoder and a decreasing absolute value given by each feature selection method.
decoder topology is symmetrical and usually follows descending and The figure shows that the studied ML methods behave very differently
ascending pattern respectively. For our experiments the initial neuron depending on the number of features used. Isolation Forest is the most
count for the encoder was defined in the range [32, 512]. sensitive, and even when a large number of features is used, adding just
Arguably Isolation Forest was the most difficult to tune. As de- one additional feature can cause the method to significantly change the
scribed in Section 3.2 Isolation Forest has many steps which are affected number of points considered as anomalous. On the other hand, VAE is
by randomness to a substantial degree. This is useful in many instances shown to be the most stable. In fact, using PCA, MAD, and SHAP as the
however, it can yield inconsistent results. If no random seed is used on feature selection method, the number of anomalies detected using VAE
a badly tuned model each training iteration of the Isolation Forest can remains almost constant when the number of features is larger than
yield different results. After hundreds of experimental runs, we decided about 1/4th of the feature set. As for the feature selection methods,
on a relatively small number of estimators (20), maximum number of Fig. 4 shows that kurtosis gives the worst performance. For the other
features for training each estimator and 0.7 as a sample size in order methods Fig. 4 shows that there is no clear winner and that, depending
to help avoid overfitting. on the ML algorithm, a different feature selection method may give
The results of all experiments are contained in Table 2. For each better results.
algorithm, we list training and inference times as well as the number As explained above, the contamination parameter set during train-
of anomalies detected in the testing set, and how many of them fall ing produced 37 anomalies out of the 7237 samples. We should also
inside the gateway failure interval (122 samples were gathered during mention that we specifically selected a time frame where we assume
the failure interval). Since the gateway failure is clearly an anoma- no anomalous events have occurred. Even so, except for PCA, the ML
lous condition, the number of samples detected as anomalous by the methods largely detect the same anomalies. This can be easily seen in
algorithms belonging to the gateway failure interval can be used as a Fig. 5.(a). This figure shows the detected overlapping anomalies among
performance measure. For each algorithm, Table 2 shows the anomalies the 4 ML methods in the training set. The largest difference occurs
detected using all features and only traffic features. By comparing the between PCA and the other ML methods. This is possibly due to the
anomalies detected in both cases, we see that considering all features different normalization applied to PCA. However, if we consider the
significantly increases the performance of the algorithms. For example, testing set there is a high agreement among all ML methods, as shown
with VAE with all features, 134 anomalies were detected, of which all in Fig. 5.(b). First, note that the number of anomalies is much higher
122 samples were gathered during the gateway failure interval. Using than the 4 anomalies that we would expect. Of course, this is due to
traffic features alone, only 23 anomalies were detected, of which only the gateway failure interval, where most of the anomalies are found,
12 were within the gateway failure interval. In terms of runtime, we can as shown in Fig. 5.(c). For example, Fig. 5.(c) shows that PCA and
clearly see that model inference requires less runtime than training. In VAE detect all 122 points in the gateway failure interval as anomalous.
our case, we only measured inference on the training dataset, since it However, PCA detects 151 anomalous points in the testing set, while
is an order of magnitude larger than the testing set. VAE detects only 134. Possibly, the higher number of points detected
The next step in our experiments was to test the ML methods by by PCA is due to overfitting.
varying the size of the dataset. For this purpose, we first varied the To have some insight into the ML methods being compared, Fig. 6
number of samples used for the training set. The results are given in shows a 2D PCA projection of the training set (principal components
Fig. 3, showing the number of anomalies with respect to the training PC1, PC2) for Isolation Forest, CBLOF, and VAE. Note that the projected

197
L. Cerdà-Alabern, G. Iuhasz and G. Gemmi Computer Communications 202 (2023) 191–203

Fig. 3. Number of anomalies found in the testing set varying the number of samples used for training (samples gathered during the previous days). The figure also shows the
number of anomalies obtained within the gateway failure interval (dashed line), and the number of points in this interval (122).

Fig. 4. Number of anomalies found in the testing set, as in Fig. 3, but varying the number of features used for training. The features are selected with different methods (shown
at the top of each column).

Fig. 5. Detected overlapping anomalies. Each row/column represents an AD method. Each cell of the heatmap represents the number of overlapping anomalies detected by that
pair of methods. At the top of each figure there is the total number of points of the set.

Fig. 6. Decision boundaries of the ML methods re-projected using PCA with 2 components. The projected points are those in the training set (circles), anomalies due to the
contamination factor in the training (squares), and anomalies in the test set (stars).

198
L. Cerdà-Alabern, G. Iuhasz and G. Gemmi Computer Communications 202 (2023) 191–203

Fig. 7. SHAP heatmap plots for the training and testing anomalies obtained with Isolation Forest. The 𝑦-axis shows the model inputs sorted in descending order from top to
bottom, for each anomalous event.

Fig. 8. Features contributing most to SHAP score during the testing set. Vertical lines correspond to the failure interval. Marked samples correspond to anomalies detected by
PCA.

points have the same coordinates since we have used the same scaling factor) match those features that contribute most using the testing set
for these methods. We have marked the 37 anomalies computed for the (caused by the gateway failure). This fact suggests, as expected, that
training set, and we have also projected the anomalies computed for the anomalies detected in both sets are of different nature.
the testing set. In addition, we have run the methods for the projected Fig. 8 shows a time series of the two features that contribute most
training set in 2D and plotted the resulting decision boundary. Note to the AD in the testing set (as shown in Fig. 7.(b)). These are the CPU
that the anomalies that would be detected using the 2D training set context switches rate of node 42 (ctxt.rate-42), and the rate of
(those points outside the decision boundary) would be quite different transmitted bytes of node 7 (txb.rate-7). Note that ctxt.rate-
from the anomalies detected using the full feature set. We can easily 42 has an average value around 200 during normal operation, and 0
see the differences between how each ML method ‘‘decides’’ what during the gateway failure. This is because during the gateway failure
is an anomalous event and what is not. Of particular interest is the node 42 was disconnected from the network, and no samples were
decision boundary calculated for Isolation Forest. Since this method is gathered. These missing samples were set to 0. On the other hand,
based on the same basic concepts as supervised decision tree models Fig. 8 shows that txb.rate-7 has an atypical higher value during
(in particular, Random Forest), we can safely assume that some of the gateway failure. This is because node 7 is close to the gateway that
the idiosyncrasies can also be found in Isolation Forest. Fig. 6 clearly failed, and during the failure it absorbed most of the traffic that was
suggests that model overfitting can be a problem in the case of Isolation redirected to the other gateway in the mesh.
Forest. This can lead to poor out-of-sample performance.
For the interpretation of the anomalies, Fig. 7 shows a SHAP 5.1. Discussion
heatmap of the anomalies obtained using the training and testing sets,
in sub-figures (a) and (b), respectively. On the abscissa axis we see Our experiment results have been interesting, we found that all
the anomalous events, while the ordinate axis shows the model inputs 4 selected ML detection methods perform well on the given dataset.
sorted in decreasing order from top to bottom. The computed Shapely Unfortunately, in the case of feature selection methods, this is not
values are on a color scale based on hierarchical clustering and their the case. The method performing best in most cases is based on the
explanation similarity. At the top of each subfigure is the output of the calculation of Shapely values which are computationally expensive.
𝑓 (𝑥) function with the predicted values, which is always 1 because we This is especially true in the case of large feature spaces as the total
are plotting only the anomalous samples. In Fig. 7 it can be seen that number of possible coalitions is 2𝑛 where 𝑛 is the number of features.
the intensity of the colors of the anomalies obtained for the training set The SHAP library we used in our experiments utilizes a permutation
are much lighter and sparser, while they are more intense and frequent based Shapely value estimator. VAE performed the best with the full
for the testing set. This makes it clear that there is a group of features feature space while Isolation Forest improved the most after Shapely
that are significantly affected by the gateway failure during the testing value based feature selection. Overall CBLOF performance was the
set and, therefore, are assigned a high Shapley value for most of the most consistent with relatively fast training and inference times. The
anomalies detected in the testing set. It is also interesting to note in MAD selection method also provided a noteworthy reduction in feature
Fig. 7 that none of the features that contribute most to the anomalies space, especially for Isolation Forest and VAE methods. However, in the
found using the training set (obtained by setting the contamination case of CBLOF the results are not as stable, recursively adding features

199
L. Cerdà-Alabern, G. Iuhasz and G. Gemmi Computer Communications 202 (2023) 191–203

Fig. 9. SHAP representation of an anomalous and normal samples randomly taken from the testing set. At the bottom is the expected value of the model. Each line shows the
positive (red) or negative (blue) contribution of each feature to the prediction. The features are sorted in descending order of contribution.

Fig. 10. Feature separability for most impactful features. The figure shows the histograms of the scaled most impactful features for the normal and anomalous events of the testing
set.

based on their MAD score can cause a significant dip in predictive


performance.
One issue which has come up during our experiments is that we do
not know how many anomalous events types there are in the dataset.
All 4 methods label samples as anomalous or normal but they do not
have a distinction regarding anomaly types. Fig. 9 shows waterfall
plots from SHAP representing an anomalous event and a normal event,
picket at random from the testing set, features are sorted in decreasing
order. We can see that the predictions are 𝑓 (𝑥) = 1 and 𝑓 (𝑥) = 0
for the anomaly, and normal sample, respectively with a base value
of 𝐸[𝑓 (𝑥)] = 0.48. This base value represents the average of the target
variable for all events from the dataset. The features colored in red,
shift the predictions towards 1 (anomalous event) while those marked
in blue shift them towards 0 (normal event). The separation of predicted
events can be easily observed. This also remains true when we plot
histograms for the most impactful features of the testing set, as seen in Fig. 11. Anomalous event clustering using HDDBSCAN. On the left is the anomalous
Fig. 10. event cluster (97 points), and on the right is the noise (20 points). 107 points
This information although useful, will still not answer the question: correspond to gateway failure (circles), of which 93 are in the anomalous cluster, and
How many anomaly types are there and how does the reduction in 14 in the noise.
feature space affect prediction? It is highly likely that by reducing
the feature space we effectively remove the capacity of our detection
methods from detecting any other anomaly type. For example, if we a major challenge, if it is set too small a substantial part of the data will
only have CPU metrics in a dataset any memory-related anomalies not be assigned to a cluster and marked as outliers. If it is too high all
would be almost impossible to find. clusters are merged into a single large cluster. HDDBSCAN eliminates
To answer this question we decided to cluster the detected anoma- this issue by returning meaningful clusters with little parameter tuning
lies using a method that does not force each datapoint (event) to belong necessary.
to a cluster and also has the concept of noise. For this we selected In our case, we set the minimum cluster size to 30. For consistency,
the Hierarchical Density-Based Spatial Clustering of Applications with we used the anomalies identified by Isolation Forest. Fig. 11 shows that
Noise (HDDBSCAN) [77] algorithm which in itself is also a method used HDDBSCAN identified one cluster containing 97 anomalous events and
for AD. It performs DBSCAN [78] over varying epsilon (𝑒𝑝𝑠) values and 20 anomalous events identified as noise. From these anomalies, 107
integrates the result to find a clustering of the data points with the best correspond to the gateway failure interval, 14 of them being classified
stability over 𝑒𝑝𝑠. as noise, and 93 in the cluster. Our interpretation of these results is that
In standard DBSCAN 𝑒𝑝𝑠 specifies how close points should be to each most anomalous events we have identified are of the same type, which
other to be considered part of a cluster. The setting of this parameter is corresponds to the gateway failure.

200
L. Cerdà-Alabern, G. Iuhasz and G. Gemmi Computer Communications 202 (2023) 191–203

When it comes to the complexity of the selected ML methods we Local Outlier Factor, CBLOF, uses the clustering principle, choosing
should first state that IF is an algorithm especially suited for processing as anomalous those samples that have larger distances to the formed
high volumes of data having a linear time complexity with a low clusters. Finally, we have used Variational AutoEncoders, VAE, which
constant and low potential for memory requirements (depending on is based on a deep neural network model. All these methods have been
the hyper-parameters used) [1]. Similarly, CBLOF is also well suited for shown in the literature to be particularly suitable for AD.
large volumes of data 𝑂(𝑁 log 𝑁) in the best case scenario [55]. Lastly,
Unlike other works found in the literature, we have constructed a
VAE is notorious for its high computational complexity which is largely
dataset of a production network. For our dataset we used an interval
based on the network topology being used during training. However, in
in which an unprovoked gateway failure occurred. This rare event
our experiments this added complexity resulted in interesting results.
VAE being able to outperform other methods while having a minimal allowed us to analyze the performance of the ML methods under study
topology. A topology which is a direct result of the structure of the by checking how many of the samples gathered during the gateway
dataset used. Also, VAE models was the most consistent during hyper- failure interval are flagged as anomalous. The dataset was constructed
parameter optimization, resulting in workable models in all but the by collecting samples every 5 minutes. The data samples have not only
most extreme hyper-parameter values. traffic characteristics, but also non-traffic characteristics, such as CPU
We should also point out that our experiments encompass sev- usage, memory usage, etc. We used a period of 4 weeks for the training
eral additional ML methods. The results presented in this paper are set, and a period of 3 days for the testing set. The gateway failure occurs
based on the performance of each model. For example, we have per- for 16 hours in the middle of the testing set. The main conclusions can
formed experiments using GAN-based methods such as ALAD [79] and be summarized as follows:
AnoGAN [80]. The main difference between the two is that ALAD
is built on top of a two-way GAN which avoids the computationally • When properly tuned, the anomalies derived from gateway failure
expensive inference of AnoGAN. Unfortunately experimental results are well captured by all ML methods.
have been poor in both cases. ALAD had erratic results, even with a • Outside the failure interval several traffic spikes are also marked
fixed random seed and hyper-parameters, the resulting models gave as anomalies. These spikes, however, can be considered as noise,
different results, while the AnoGAN training took a long time (well and a consequence of irregular traffic patterns of users traffic.
over 2 hours) with limited results. GAN-based networks are known to • The deep learning method based on VAE outperforms the other
be difficult to train. They suffer from mode collapse and generate only ML methods, at cost of significantly higher computational cost.
a subset of the original dataset [62]. • Considering other features related to CPU and memory in addition
Similarly, we ran several experiments using 𝛽 − 𝑉 𝐴𝐸 with the to traffic features significantly increases the number of points
values for 𝛽 ranging from [0.1 < 10.0]. However, the best results where detected as anomalous inside the gateway failure interval.
obtained using the default 𝛽 value of 1.0 included in the original • Of the feature selection methods we tested, mean absolute differ-
paper. This was not completely unexpected as there are some issues
ence, MAD, and Shapely value, SHAP, performed the best. MAD
with reproducibility of 𝛽 − 𝑉 𝐴𝐸 results [81]. Finally, we also used
is much easier to calculate, while SHAP provides a better under-
an implementation of Deep-SVDD [82] in our experiments with results
standing of why events are flagged as anomalous (explainability).
notably worse than the algorithms presented in Section 4.
For brevity, we did not include all the algorithms in our experi-
ments, we selected 3 that have the best performance and also have dif-
CRediT authorship contribution statement
ferent underlying principles. However, the code and hyper-parameter
settings used for these experiments are available in our repository.
We conclude that the lack of predictive performance of some of the Llorenç Cerdà-Alabern: Conceptualization, Investigation, Software,
methods used in our experiments is due to the large feature space of our Writing, Dataset gathering. Gabriel Iuhasz: Conceptualization, Inves-
dataset. Although we find that the use of feature selection techniques tigation, Software, Writing. Gabriele Gemmi: Investigation, Writing.
performs well for the three best performing models (see Fig. 3) future
work will focus on additional methods. Non-negative matrix factoriza-
tion (NMF) [83] is promising, but our initial experiments with it have Declaration of competing interest
produced limited results. Future work will focus on addressing this
issue and include a reformulation of the training and testing datasets
The authors declare that they have no known competing finan-
including graphs of the routing tables used in the mesh network as well
as the inclusion of additional AD methods. cial interests or personal relationships that could have appeared to
influence the work reported in this paper.
6. Conclusion

Data availability
In the last decade, machine learning, ML, methods have exploded
and have been applied to an increasing number of fields. Computer
networks are no exception, and numerous works have used ML to The dataset is available at the link to Zenodo provided in the paper.
investigate network intrusion detection. Fault detection in computer
networks is an attractive application for ML that has received little
attention, largely due to the lack of available datasets. In this paper, Acknowledgments
we attempt to fill this gap by performing anomaly detection, for fault
detection using ML. The MLs chosen for our study are based on 4 very
different principles. The first one is Principal Component Analysis. PCA This work has received funding through the DiPET CHIST-ERA
is a well-known method that has been widely used in the manufacturing under grant agreement PCI2019-111850-2; Spanish grant PID2019-
industry. PCA is based on the projection onto a reduced dimension sub- 106774RB-C21; Romanian DIPET (62652/15.11.2019) project funded
space that preserves maximum variance. The second is Isolation Forest, via PN 124/2020; and has been partially supported by the EU research
IF. IF is based on the idea of constructing decision trees that isolate the project SERRANO (101017168) and hardware resources courtesy of
samples into different classes, and take as anomalous those that are the Romanian Ministry of Research and Innovation UEFISCDI COCO
isolated in a smaller number of steps. The third one, Clustering-Based research project PN III-P4-ID-PCE-2020-0407.

201
L. Cerdà-Alabern, G. Iuhasz and G. Gemmi Computer Communications 202 (2023) 191–203

References [33] N. Takeishi, Y. Kawahara, On anomaly interpretation via Shapley values, 2020,
arXiv:2004.04464.
[1] V. Chandola, A. Banerjee, V. Kumar, Anomaly detection: A survey, ACM Comput. [34] L. Antwarg, R.M. Miller, B. Shapira, L. Rokach, Explaining anomalies detected by
Surv. 41 (3) (2009) 1–58. autoencoders using Shapley additive explanations, Expert Syst. Appl. 186 (2021)
[2] C.C. Aggarwal, Outlier Analysis, Springer, 2017. 115736, http://dx.doi.org/10.1016/j.eswa.2021.115736.
[3] M. Ahmed, A.N. Mahmood, J. Hu, A survey of network anomaly detection [35] C. Zhou, R.C. Paffenroth, Anomaly detection with robust deep autoencoders, in:
techniques, J. Netw. Comput. Appl. (60) (2016) 19–31. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge
[4] D.P. Kumar, T. Amgoth, C.S.R. Annavarapu, Machine learning algorithms for Discovery and Data Mining, KDD ’17, Association for Computing Machinery,
wireless sensor networks: A survey, Inf. Fusion 49 (2019) 1–25. New York, NY, USA, 2017, pp. 665–674, http://dx.doi.org/10.1145/3097983.
[5] M. Mohri, A. Rostamizadeh, A. Talwalkar, Foundations of Machine Learning, MIT 3098052.
Press, 2018. [36] D.P. Kingma, M. Welling, Auto-encoding variational Bayes, 2014, arXiv:1312.
[6] K.P. Murphy, Probabilistic Machine Learning: An Introduction, MIT Press, 2022. 6114.
[7] L.N. Tidjon, M. Frappier, A. Mammar, Intrusion detection systems: A [37] M. Moulay, R.G. Leiva, V. Mancuso, P.J.R. Maroni, A.F. Anta, Ttrees: Automated
cross-domain overview, IEEE Commun. Surv. Tutor. 21 (4) (2019) 3639–3681. classification of causes of network anomalies with little data, in: 2021 IEEE
22nd International Symposium on a World of Wireless, Mobile and Multimedia
[8] G. Fernandes, J.J. Rodrigues, L.F. Carvalho, J.F. Al-Muhtadi, M.L. Proença, A
Networks, WoWMoM, IEEE, 2021, pp. 199–208.
comprehensive survey on network anomaly detection, Telecommun. Syst. 70 (3)
[38] K. Sequeira, M. Zaki, ADMIT: Anomaly-based data mining for intrusions, in:
(2019) 447–489.
Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge
[9] D. Vega, R. Baig, L. Cerdà-Alabern, E. Medina, R. Meseguer, L. Navarro, A
Discovery and Data Mining, KDD ’02, Association for Computing Machinery, New
technological overview of the guifi.net community network, Comput. Netw. 9
York, NY, USA, 2002, pp. 386–395, http://dx.doi.org/10.1145/775047.775103.
(2) (2015) 260–278, http://dx.doi.org/10.1016/j.comnet.2015.09.023.
[39] Y.F. Zhang, Z.Y. Xiong, X.Q. Wang, Distributed intrusion detection based on
[10] Y. Ben David, Connecting the Last Billion (Ph.D. thesis), UC Berkeley, 2015.
clustering, in: 2005 International Conference on Machine Learning and Cyber-
[11] Guifi.net, Open, free and neutral network internet for everybody, 2021, http:
netics, vol. 4, 2005, pp. 2379–2383 Vol. 4, http://dx.doi.org/10.1109/ICMLC.
//guifi.net/en. (Accessed 13 January 2021).
2005.1527342.
[12] L. Cerdà-Alabern, R. Baig, L. Navarro, On the guifi.net community network
[40] M.H. Bhuyan, D.K. Bhattacharyya, J.K. Kalita, An effective unsupervised network
economics, Comput. Netw. 168 (2020) 107067.
anomaly detection method, in: Proceedings of the International Conference
[13] GuifiSants, Xarxa oberta, lliure i neutral del barri de sants, 2021, http://sants.
on Advances in Computing, Communications and Informatics, ICACCI ’12,
guifi.net/. (Accessed January 2021).
Association for Computing Machinery, New York, NY, USA, 2012, pp. 533–539,
[14] J. Camacho, A. Pérez-Villegas, P. Garcí a Teodoro, G. Maciá-Fernández, PCA-
http://dx.doi.org/10.1145/2345396.2345484.
based multivariate statistical network monitoring for anomaly detection, Comput.
[41] N. Hu, Z. Tian, H. Lu, X. Du, M. Guizani, A multiple-kernel clustering based
Secur. 59 (2016) 118–137.
intrusion detection scheme for 5G and IoT networks, Int. J. Mach. Learn. Cybern.
[15] D.H. Hoang, H.D. Nguyen, A PCA-based method for IoT network traffic anomaly
12 (2021) http://dx.doi.org/10.1007/s13042-020-01253-w.
detection, in: 2018 20th International Conference on Advanced Communication
[42] M.E. Otey, A. Ghoting, S. Parthasarathy, Fast distributed outlier detection in
Technology, ICACT, IEEE, 2018, pp. 381–386.
mixed-attribute data sets, Data Min. Knowl. Discov. 12 (2–3) (2006) 203–228,
[16] I.K. Savvas, A.V. Chernov, M.A. Butakova, C. Chaikalis, Increasing the quality
http://dx.doi.org/10.1007/s10618-005-0014-6.
and performance of N-dimensional point anomaly detection in traffic using PCA
[43] M. Bhuyan, D.K. Bhattacharyya, J. Kalita, A multi-step outlier-based anomaly
and DBSCAN, in: 2018 26th Telecommunications Forum, TELFOR, IEEE, 2018,
detection approach to network-wide traffic, Inform. Sci. 348 (2016) http://dx.
pp. 1–4.
doi.org/10.1016/j.ins.2016.02.023.
[17] Munin networked resource monitoring tool, http://munin-monitoring.org.
[44] P. Casas, J. Mazel, P. Owezarski, Unsupervised network intrusion detection
[18] Nagios, The Industry Standard In IT Infrastructure Monitoring, https://www.
systems: Detecting the unknown without knowledge, Comput. Commun. 35 (7)
nagios.org.
(2012) 772–783, http://dx.doi.org/10.1016/j.comcom.2012.01.016.
[19] L. Cerdà-Alabern, Dataset for Anomaly Detection in a Production Wireless
[45] O. Iraqi, H. El Bakkali, Application-level unsupervised outlier-based intrusion
Mesh Community Network, Zenodo, 2022, http://dx.doi.org/10.5281/zenodo.
detection and prevention, Secur. Commun. Netw. 2019 (2019) 1–13, http://dx.
6169917.
doi.org/10.1155/2019/8368473.
[20] V. Hodge, J. Austin, A survey of outlier detection methodologies, Artif. Intell. [46] M. Khan, Rule based network intrusion detection using genetic algorithm, Int.
Rev. 22 (2) (2004) 85–126. J. Comput. Appl. 18 (2011) 26–29, http://dx.doi.org/10.5120/2303-2914.
[21] S. Northcutt, J. Novak, Network Intrusion Detection, Sams Publishing, 2002. [47] H. Alsaadi, R. Almuttairi, O. Ucan, O. Bayat, An adapting soft computing model
[22] A. Lakhina, K. Papagiannaki, M. Crovella, C. Diot, E.D. Kolaczyk, N. Taft, for intrusion detection system, Comput. Intell. 01 (2021) http://dx.doi.org/10.
Structural analysis of network traffic flows, in: Proceedings of the Joint Inter- 1111/coin.12433.
national Conference on Measurement and Modeling of Computer Systems, 2004 [48] A. Shenfield, D. Day, A. Ayesh, Intelligent intrusion detection systems using
pp. 61–72. artificial neural networks, ICT Express 4 (2) (2018) 95–99, http://dx.doi.org/
[23] Z.R. Zaidi, S. Hakami, T. Moors, B. Landfeldt, Detection and identification of 10.1016/j.icte.2018.04.003, SI on Artificial Intelligence and Machine Learning.
anomalies in wireless mesh networks using principal component analysis (PCA), [49] C.F. Alcala, S.J. Qin, Analysis and generalization of fault diagnosis methods for
J. Interconnect. Netw. 10 (04) (2009) 517–534. process monitoring, J. Process Control 21 (3) (2011) 322–330.
[24] Z.R. Zaidi, S. Hakami, B. Landfeldt, T. Moors, Real-time detection of traffic [50] P. Miller, R.E. Swanson, C.E. Heckler, Contribution plots: A missing link in
anomalies in wireless mesh networks, Wirel. Netw. 16 (6) (2010) 1675–1689. multivariate quality control, Appl. Math. Comput. Sci. 8 (4) (1998) 775–792.
[25] C. Pascoal, M.R. De Oliveira, R. Valadas, P. Filzmoser, P. Salvador, A. Pacheco, [51] C.C. Aggarwal, S. Sathe, Theoretical foundations and algorithms for outlier
Robust feature selection and robust PCA for internet traffic anomaly detection, ensembles, SIGKDD Explor. Newsl. 17 (1) (2015) 24–47, http://dx.doi.org/10.
in: 2012 Proceedings IEEE Infocom, IEEE, 2012, pp. 1755–1763. 1145/2830544.2830549.
[26] H. Ringberg, A. Soule, J. Rexford, C. Diot, Sensitivity of PCA for traffic [52] C.C. Aggarwal, Outlier ensembles: Position paper, SIGKDD Explor. Newsl. 14 (2)
anomaly detection, in: Proceedings of the 2007 ACM SIGMETRICS Interna- (2013) 49–58, http://dx.doi.org/10.1145/2481244.2481252.
tional Conference on Measurement and Modeling of Computer Systems, 2007 [53] F.T. Liu, K.M. Ting, Z.H. Zhou, Isolation-based anomaly detection, ACM Trans.
pp. 109–120. Knowl. Discov. Data 6 (1) (2012) http://dx.doi.org/10.1145/2133360.2133363.
[27] N.L.D. Khoa, T. Babaie, S. Chawla, Z. Zaidi, Network anomaly detection using [54] F.T. Liu, K.M. Ting, Z.H. Zhou, Isolation forest, in: 2008 Eighth IEEE International
a commute distance based approach, in: 2010 IEEE International Conference on Conference on Data Mining, 2008, pp. 413–422, http://dx.doi.org/10.1109/
Data Mining Workshops, IEEE, 2010, pp. 943–950. ICDM.2008.17.
[28] A.B. Nassif, M.A. Talib, Q. Nasir, F.M. Dakalbab, Machine learning for anomaly [55] Z. He, X. Xu, S. Deng, Discovering cluster-based local outliers, Pattern Recog-
detection: A systematic review, IEEE Access 9 (2021) 78658–78700, http://dx. nit. Lett. 24 (9) (2003) 1641–1650, http://dx.doi.org/10.1016/S0167-8655(03)
doi.org/10.1109/ACCESS.2021.3083060. 00003-5.
[29] S. Goldstein, A comparative evaluation of unsupervised anomaly detection [56] M.M. Breunig, H.P. Kriegel, R.T. Ng, J. Sander, LOF: Identifying density-based
algorithms for multivariate data, PLoS One 11 (4) (2016) 1–31, http://dx.doi. local outliers, SIGMOD Rec. 29 (2) (2000) 93–104, http://dx.doi.org/10.1145/
org/10.1371/journal.pone.0152173. 335191.335388.
[30] M. Ahmed, A.N. Mahmood, Novel approach for network traffic pattern analysis [57] R. Zheng, J. Gu, Anomaly detection for power system forecasting under data cor-
using clustering-based collective anomaly detection, Ann. Data Sci. 2 (2015) ruption based on variational auto-encoder, in: 8th Renewable Power Generation
111–130. Conference, RPG 2019, 2019, pp. 1–6, http://dx.doi.org/10.1049/cp.2019.0461.
[31] X. Chun-Hui, S. Chen, B. Cong-Xiao, L. Xing, Anomaly detection in network [58] R. Yao, C. Liu, L. Zhang, P. Peng, Unsupervised anomaly detection using
management system based on isolation forest, in: 2018 4th Annual International variational auto-encoder based feature extraction, in: 2019 IEEE International
Conference on Network and Information Systems for Computers, ICNISC, 2018, Conference on Prognostics and Health Management, ICPHM, 2019, pp. 1–7,
pp. 56–60, http://dx.doi.org/10.1109/ICNISC.2018.00019. http://dx.doi.org/10.1109/ICPHM.2019.8819434.
[32] G. Pang, C. Shen, L. Cao, A.V.D. Hengel, Deep learning for anomaly detection: A [59] Y. Aizenbud, O. Lindenbaum, Y. Kluger, Probabilistic robust autoencoders for
review, ACM Comput. Surv. 54 (2) (2021) http://dx.doi.org/10.1145/3439950. anomaly detection, 2021, arXiv:2110.00494.

202
L. Cerdà-Alabern, G. Iuhasz and G. Gemmi Computer Communications 202 (2023) 191–203

[60] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: [74] Y. Zhao, Z. Nasrullah, Z. Li, Pyod: A Python toolbox for scalable outlier detection,
A simple way to prevent neural networks from overfitting, J. Mach. Learn. Res. J. Mach. Learn. Res. 20 (96) (2019) 1–7, http://jmlr.org/papers/v20/19-011.
15 (56) (2014) 1929–1958, http://jmlr.org/papers/v15/srivastava14a.html. html.
[61] C.P. Burgess, I. Higgins, A. Pal, L. Matthey, N. Watters, G. Desjardins, A. [75] A. Mahfouz, A. Abuhussein, D. Venugopal, S. Shiva, Ensemble classifiers for
Lerchner, Understanding disentangling in 𝛽-VAE, CoRR (2018) arXiv:1804.03599. network intrusion detection using a novel network attack dataset, Future Internet
[62] L. Zhou, W. Deng, X. Wu, Unsupervised anomaly localization using VAE and 12 (11) (2020) http://dx.doi.org/10.3390/fi12110180.
beta-VAE, 2020, http://dx.doi.org/10.48550/ARXIV.2005.10686. [76] S.M. Lundberg, S.I. Lee, A unified approach to interpreting model predictions,
[63] J. Cai, J. Luo, S. Wang, S. Yang, Feature selection in machine learning: A in: I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan,
new perspective, Neurocomputing 300 (2018) 70–79, http://dx.doi.org/10.1016/ R. Garnett (Eds.), Advances in Neural Information Processing Systems 30, Curran
j.neucom.2017.11.077.
Associates, Inc., 2017, pp. 4765–4774.
[64] C. Suman, S. Tripathy, S. Saha, Building an effective intrusion detection system
[77] L. McInnes, J. Healy, S. Astels, Hdbscan: Hierarchical density based clustering, J.
using unsupervised feature selection in multi-objective optimization framework,
Open Source Softw. 2 (11) (2017) 205, http://dx.doi.org/10.21105/joss.00205.
2019, arXiv preprint arXiv:1905.06562.
[78] M. Ester, H.P. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering
[65] A.J. Ferreira, M.A.T. Figueiredo, Efficient feature selection filters for high-
clusters in large spatial databases with noise, in: Proceedings of the Second
dimensional data, Pattern Recognit. Lett. 33 (13) (2012) 1794–1804, http:
//dx.doi.org/10.1016/j.patrec.2012.05.019. International Conference on Knowledge Discovery and Data Mining, KDD ’96,
[66] L.S. Shapley, 17. A value for n-person games, in: H.W. Kuhn, A.W. Tucker (Eds.), AAAI Press, 1996, pp. 226–231.
Contributions to the Theory of Games (AM-28), vol. II, Princeton University [79] H. Zenati, M. Romain, C.S. Foo, B. Lecouat, V.R. Chandrasekhar, Adversari-
Press, 2016, pp. 307–318, http://dx.doi.org/10.1515/9781400881970-018. ally learned anomaly detection, 2018, http://dx.doi.org/10.48550/ARXIV.1812.
[67] C. Molnar, Interpretable Machine Learning, Independently published, 2022. 02288.
[68] OpenWrt Project, OpenWrt project: Welcome to the OpenWrt project, 2021, [80] T. Schlegl, P. Seeböck, S.M. Waldstein, U. Schmidt-Erfurth, G. Langs, Unsuper-
https://openwrt.org/. (Accessed January 2021). vised anomaly detection with generative adversarial networks to guide marker
[69] BMX6 mesh networking protocol, http://bmx6.net. (Accessed January 2021). discovery, 2017, http://dx.doi.org/10.48550/ARXIV.1703.05921.
[70] L. Cerdà-Alabern, A. Neumann, L. Maccari, Experimental evaluation of BMX6 [81] M. Fil, M. Mesinovic, M. Morris, J. Wildberger, Beta-VAE reproducibility: Chal-
routing metrics in a 802.11an wireless-community mesh network, in: 2015 lenges and extensions, 2021, http://dx.doi.org/10.48550/ARXIV.2112.14278,
3rd International Conference on Future Internet of Things and Cloud, 2015 https://arxiv.org/abs/2112.14278.
pp. 770–775. [82] P. Liznerski, L. Ruff, R.A. Vandermeulen, B.J. Franks, M. Kloft, K.R. Müller, Ex-
[71] GuifiSants, qMp Sants-UPC, 2021, http://dsg.ac.upc.edu/qmpsu. (Accessed plainable deep one-class classification, 2020, http://dx.doi.org/10.48550/ARXIV.
January 2021). 2007.01760.
[72] L. Cerdà-Alabern, A. Neumann, P. Escrich, Experimental evaluation of a wireless [83] H. Alshammari, O. Ghorbel, M. Aseeri, M. Abid, Non-negative matrix fac-
community mesh network, in: The 16th ACM International Conference on torization (NMF) for outlier detection in wireless sensor networks, in: 2018
Modeling, Analysis and Simulation of Wireless and Mobile Systems, MSWiM’13,
14th International Wireless Communications & Mobile Computing Conference,
ACM, Barcelona, Spain, 2013.
IWCMC, 2018, pp. 506–511, http://dx.doi.org/10.1109/IWCMC.2018.8450421.
[73] F. Pedregosa, et al., Scikit-learn: Machine learning in Python, J. Mach. Learn.
Res. 12 (2011) 2825–2830.

203

You might also like