0% found this document useful (0 votes)
73 views9 pages

Anomaly Detection in Self-Organizing Networks - Conventional Versus Contemporary Machine Learning

This document summarizes a study that compares conventional machine learning and deep learning approaches for anomaly detection in self-organizing networks. The study claims to achieve state-of-the-art performance on a public dataset, with an average 15% improvement over previous deep learning methods. It demonstrates that support vector machines using data augmentation can outperform autoencoders. Conventional machine learning provided nearly two orders of magnitude faster computation and an order of magnitude fewer parameters than deep learning, making it a robust alternative for critical detection times in 5G networks.

Uploaded by

Sudhakar HOD i/c
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views9 pages

Anomaly Detection in Self-Organizing Networks - Conventional Versus Contemporary Machine Learning

This document summarizes a study that compares conventional machine learning and deep learning approaches for anomaly detection in self-organizing networks. The study claims to achieve state-of-the-art performance on a public dataset, with an average 15% improvement over previous deep learning methods. It demonstrates that support vector machines using data augmentation can outperform autoencoders. Conventional machine learning provided nearly two orders of magnitude faster computation and an order of magnitude fewer parameters than deep learning, making it a robust alternative for critical detection times in 5G networks.

Uploaded by

Sudhakar HOD i/c
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Received May 15, 2022, accepted May 26, 2022, date of publication June 10, 2022, date of current

version June 15, 2022.


Digital Object Identifier 10.1109/ACCESS.2022.3182014

Anomaly Detection in Self-Organizing Networks:


Conventional Versus Contemporary
Machine Learning
MUHAMMED FURKAN KUCUK , (Graduate Student Member, IEEE),
AND ISMAIL UYSAL , (Member, IEEE)
Department of Electrical Engineering, University of South Florida, Tampa, FL 33620, USA
Corresponding author: Muhammed Furkan Kucuk ([email protected])
This work was supported in part by the Republic of Turkey’s Ministry of National Education through study abroad programme.

ABSTRACT This paper presents a comparison of conventional and modern machine (deep) learning
within the framework of anomaly detection in self-organizing networks. While deep learning has gained
significant traction, especially in application scenarios where large volumes of data can be collected and
processed, conventional methods may yet offer strong statistical alternatives, especially when using proper
learning representations. For instance, support vector machines have previously demonstrated state-of-
the-art potential in many binary classification applications and can be further exploited with different
representations, such as one-class learning and data augmentation. We demonstrate for the first time, on a
previously published and publicly available dataset, that conventional machine learning can outperform the
previous state-of-the-art using deep learning by 15% on average across four different application scenarios.
Our results further indicate that with nearly two orders of magnitude improvement in computational speed
and an order of magnitude reduction in trainable parameters, conventional machine learning provides a robust
alternative for 5G self-organizing networks especially when the execution and detection times are critical.

INDEX TERMS Anomaly detection, deep learning, machine learning, mobile network communications,
self organizing networks.

I. INTRODUCTION intelligent applications on massive machine-type communi-


As a next-generation telecommunication technology, 5G cations (mMTC) in 5G or massive MIMO in wireless sensor
brings a novel perspective and innovative solutions to the networks (WSNs), to achieve better service quality through
increased demand of humans and autonomous devices [1]. improved IoT connectivity as well as to extend battery life and
This technology mainly focuses on five areas, including boost spectral efficiency by utilizing channel aware decision
dense-device structures, high carrier frequency bands such fusion methodology [3], [4].
as millimeter wave (mmWave), multi-connectivity such as In previous mobile networks, such as 4G, autonomous
massive MIMO, smart devices, and massive machine-type mobile networks or self-organizing networks (SONs) with
communications. To fulfill these requirements in such a capabilities such as self-planning, self-configuring, self-
dynamic digital world, the next-generation cellular networks optimizing, and self-healing, have been shown to signifi-
must be adaptable with predictive capabilities due to the cantly contribute to reducing network failures and boosting
ever changing environment of the nested services interacting performance without human intervention such as AirHop’s
with each other [2]. Hence, artificial intelligence (AI) has eSON [5]. However, current solutions in the market generally
garnered increased interest as a potential 5G technology to lack smart functionalities, especially for cell outage manage-
handle the dynamic environment in analyzing and contribut- ment (COM) to heal autonomically [6]. If there is no traffic
ing to the execution of the network. There are examples of due to a specific network issue, it signals an anomaly where
one or more cells may be in outage and it is vital to detect the
The associate editor coordinating the review of this manuscript and outage to resume service within the shortest time possible.
approving it for publication was Bilal Alatas . For instance, Ericsson lost at least $100 million because

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
61744 VOLUME 10, 2022
M. F. Kucuk, I. Uysal: Anomaly Detection in SONs: Conventional Versus Contemporary Machine Learning

of a network outage [7], which could have been prevented TABLE 1. Abbreviations used in this study.
using AI-powered SONs. In consideration of these issues,
the European Telecommunications Standards Institute (ETSI)
introduced a zero-touch group to research automation to
advance machine learning and AI techniques (deep learning)
specifically for anomaly detection applications on mobile
networks.
This paper has four main contributions to the field as
summarized below:
• We present a comprehensive analysis of a conventional
machine learning method for anomaly detection in self-
organizing 5G networks (5G-SONs) and compare it
with a popular deep learning alternative using different
learning representations, including one-class and binary
learning.
• We claim state-of-the-art performance on a publicly
available dataset [8], which investigates multiple use
case scenarios for anomaly detection in 5G-SONs.
We demonstrate an average improvement of 15% over
the best recent performance which was achieved by a
deep auto-encoder-based setup.
• We demonstrate for the first time that data augmentation
methods can further boost anomaly detection perfor-
mance in binary mode, even when utilizing conventional
algorithmic methods such as support vector machines on
a sufficiently large dataset.
• Finally, we achieve nearly two orders of magnitude
improvement in computational speed and an order of
magnitude reduction in trainable parameters using con-
ventional machine learning to provide a robust alterna-
tive for 5G self-organizing networks especially when the In another example, [6] utilizes the measurements and han-
execution and detection times are critical. dover statistics (inHO) from adjacent cells in a mobile com-
munications network to expose abnormalities and outages.
The rest of this paper is organized as follows. Section II Monitoring in this way provides the potential status of a cell
provides a brief summary of prior work on anomaly detection outage where the inHO information becomes zero. In [10],
in current mobile and self-organizing networks. Section III a novel online anomaly detection system was proposed in
introduces the methods used in this study. Section IV mobile networks to identify anomalies in key performance
describes the experimental setup in detail and provides indicators (KPIs). The proposed system consists of a training
the hyper-parameters, dataset characteristics, implementa- and detection/tracking block. The system learns the most
tion, evaluation metrics, and necessary details for repeata- detrimental anomalies in the training block as each recent
bility. Finally, the results and discussions are presented in KPI is sourced, and monitors its status until the end of the
Section V, followed by the conclusions in Section VI. The second block. Thus, the detection of anomalies has been set
abbreviations used in this paper have been provided in to prefer highly possible anomalies in the long term. More-
Table 1 for easy reference. over, the system objects to provide the minimum amount of
anomalies by maintaining a low positive rate on behalf of
II. RELATED RESEARCH network operators’ efforts to deal with only real anomalies.
Anomaly detection in communications has been an active In addition, the system can be extended to next-generation
research area over the last decade. For instance, in [9], abnor- networks via automatic adaptation features to a new network
mal activity in the wireless spectrum has been explored. behavior profile.
Specifically, the authors used power spectral density (PSD) The authors in [11] proposed unsupervised learning
information to identify and pinpoint anomalies in the form to detect anomalies related to mobility using mobility
of either undesired signals present in the licensed band or robustness optimization (MRO), which is an important
the absence of a desired signal. The information obtained use case of SONs for modern 4G and 5G networks.
from the PSD was processed using a combination of adver- A similar study in [12] brings a different perspective
sarial auto-encoders, convolutional neural networks, and long by using reinforcement learning to degrade call drop
short-term memory recurrent neural networks. rates.

VOLUME 10, 2022 61745


M. F. Kucuk, I. Uysal: Anomaly Detection in SONs: Conventional Versus Contemporary Machine Learning

In [13], the authors used an autoencoder-based network


anomaly detection method to detect the ratio of changes
in features that reflect non-linearity to increase detection
accuracy. The authors used a convolutional autoencoder
for dimensionality reduction and outperformed more con-
ventional methods in wireless communications to detect
cyberattacks.
Artificial intelligence (AI) applications supporting 5G
technologies can be implemented both in the physical and
network layers. For instance, authors in [14] present an excel-
lent overview of deep learning (DL) applications for the phys-
ical layer. One of the applications is a novel modification of FIGURE 1. Grid based layout used in the simulation of cell outage
the standard autoencoder called the ‘‘channel autoencoder.’’ detection.
In a typical autoencoder, the goal is to find the most com-
pressed representation of the inputs in the encoding layer with stations (BS) and the BS can fail to communicate with the
minimal loss at the output layer. In [14], the authors aim to user by performing below the expected key performance indi-
find the most robust representations of the input (messages) cator (KPI) margins. Each BS is represented as a circle and
to account for the channel degradation by adding redundan- in each circle, there are 3 cells regularly spread out shown in
cies instead of removing them. The authors further extend Fig. 1, for a total of 21 cells. Minimization of drive test (MDT)
this concept to an adversarial network of multiple transmit- report is generated for each user at a 5kHz sampling rate (i.e.,
ter/receiver pairs aiming for increased capacity. Furthermore, once every 0.2 ms) with a total recording time of 50 seconds to
they discuss augmented DL models using radio transformer obtain the associated data. The KPIs including the RSRP and
networks (RTNs). The RTNs carry channel domain informa- RSRQ measurements are collected from each of the 7 base
tion and simplify the receiver’s job for symbol detection using stations at this time. The data for each MDT report is then
a neural network estimator to obtain the best parameters for assigned a class label from the numerical set 0,1,2,3.
symbol detection. The augmented DL models on complex I/Q The three labels; 0,1, and 2 indicate normal network status
samples for modulation classification demonstrated that the with varying levels of discrete transmitter power, whereas
DL models outperformed the classification methods based on label 3 indicates that the base transceiver station (BTS) has
expert features. Another study discussed in [15] introduces failed to communicate with the user, which is classified as
designing mobile traffic classifiers based on the DL utiliza- an anomaly in the dataset. Three other scenarios are imple-
tion. A systematic framework of new DL-adopted Traffic mented in a similar way with the only differences being the
Classification (TC) structures is introduced and analyzed. numbers of users (210, 105, 105) and cells (57, 57, 57) to be
Rather than mobility, the study includes a wide allocation able to consider the anomaly detection (AD) application for
range to encrypt the TC. a wider range of networks and users with different numbers
These studies generally rely on multi-class applications of of observations.
deep neural network methods learning on features such as key Fig. 1 shows the structure of the communications network
performance indicators, handover statistics, reference signal for the case of 7 BS and 3 cells within each grid used in the
received power and quality (RSRP and RSRQ), number of first scenario for outage detection. This figure is provided
connection drops and failures. However, there exists a discus- to help with the visualization of the AD application and is
sion in the research community where other learning repre- similar for different scenarios using a range of BS, cell and
sentations such as one-class learning, and deep unsupervised user numbers.
methods have the potential to become strong alternatives for
anomaly detection in SONs [8]. Similarly, an ever-present B. SUPPORT VECTOR MACHINE
debate investigates the comparative effectiveness of empirical The support vector machine (SVM) [17] is a popular machine
deep learning models and conventional approaches such as learning algorithm that plays important roles in binary classi-
feature engineering for a range of temporal applications [16]. fication and unsupervised feature learning. An SVM model
A categorical analysis of literature discussed in this section represents and separates various classes by building a set
can be found in Table 2, which displays the important char- of hyperplanes in a multi-dimensional space. Separation of
acteristics and features such as the dataset type, topology of the classes from each other is performed iteratively by the
the network used, which learning methodology is applied for SVM algorithm through finding the optimal hyperplane (the
which application, etc. decision region between a group of objects from different
classes). In this context, the optimal hyperplane maximizes
III. METHODS the distance gap (margin) between the two lines closest to
A. ANOMALY DETECTION PROBLEM DEFINITIONS the data points from different classes based on the support
In this paper, we look at anomaly detection scenarios where a vectors (data sample points where the classification error is
total of 105 mobile users are uniformly spread around 7 base minimum).

61746 VOLUME 10, 2022


M. F. Kucuk, I. Uysal: Anomaly Detection in SONs: Conventional Versus Contemporary Machine Learning

TABLE 2. Summary of related research.

VOLUME 10, 2022 61747


M. F. Kucuk, I. Uysal: Anomaly Detection in SONs: Conventional Versus Contemporary Machine Learning

We implement SVM in two different ways:

1) ONE CLASS ANOMALY DETECTION WITH SVM


One-class SVM is a popular method for unsupervised
anomaly detection [18]–[21]. In general, an SVM is applied
to binary classification tasks. In the one-class anomaly detec-
tion case, the algorithm is trained only with observations from
the majority class to learn the ‘‘normal’’ samples. When new
data are presented to the SVM algorithm the ‘‘anomaly’’ sam- FIGURE 2. Dataset structure for the first use case.
ples generate a lower decision probability compared to the
observations from the majority class under ideal conditions. During training, the auto-encoder maps the input xεRy to
the hidden layers with lesser dimensions than the input data
2) BINARY ANOMALY DETECTION WITH SVM which produces a compressed representation of the original
A binary SVM classifier provides the most accurate results data in which its dimensionality is reduced to the code(latent)
when trained on balanced datasets. When the dataset is imbal- layer size HεRh . This first step is called the ‘‘encoder’’
anced, the SVM classifier model is inclined to overfit to the which provides the compressed information to be mapped
majority class and shows poor performance for the minority to the output layer via the ‘‘decoder,’’ through a process
class with reduced generalization. To overcome the imbal- called ‘‘reconstruction.’’ Mathematically, these two steps are
ance in a dataset, different techniques have been introduced formulated as follows:
in the past one of which, the synthetic minority oversampling H ≡ fWH (x) = f (WH x + bH ) (2)
technique or SMOTE, has gained increased popularity in
z ≡ gWz (x) = g(Wz H + bZ ) (3)
cases of severe imbalances such as anomaly detection.
where WH and WZ define the encoding weight and decod-
C. SMOTE ALGORITHM ing weight, respectively, bH and bZ define the correspond-
The synthetic minority oversampling technique, or ing encoding bias vector and the decoding bias vector, and
SMOTE [22], is one of the most popular methods for over- f (.) and g(.) are encoding and decoding activation functions
sampling to overcome the imbalance problem. Its goal is to respectively such as a sigmoid function or a rectified linear
balance class distribution by randomly increasing the minor- unit. As previously mentioned, the primary purpose of the
ity class by generating ‘‘synthetic’’ patterns based on features autoencoder is to learn useful latent information on the coding
instead of raw data. The oversampling process of the minority layer by minimizing the reconstruction error. For a given N
class (C) begins by selecting each minority class sample input data samples, the following loss function is used to
(X), where CX and interpolating synthetic instances along determine the parameters ‘‘WH , WZ , bH , and bZ ’’ through a
the lines that connect minority instances and their k-nearest back-propagation algorithm commonly used in feed-forward
neighbors x. The k value is chosen randomly according to the neural networks:
hyper-parameter oversampling rate N based on the number N
of minority samples. First, a distance-based method, such as 1 X
LAE = min ||xk − zk ||2 (4)
the ‘‘Euclidean distance,’’ is used to calculate the distance N
k=1
between the feature vector and its neighbors. Second, the
Within the context of this application, the autoencoder is
distance is multiplied by a random number between (0,1]
used for anomaly detection by training the network using
and added to the previous feature sample. Hence, new and
normal observations and once trained, comparing the recon-
synthetic features were generated along the line segments
struction error of the normal and anomalous samples to a
between the two original samples. Mathematically:
threshold for detection. It is hypothesized that the normal
X 0 = X + rand(0, 1)∗ |X − Xk | (1) samples will provide lesser reconstruction errors compared
to the anomalous samples which can easily be characterized
where X 0 is the new set of synthetic samples, and Xk is the set using the receiever-operating-characteristics (ROC). More
of randomly selected k-nearest neighbor samples. specifically, in [8], the autoencoder model structure start with
an input vector of size 20 (which corresponds to 10 RSRP
D. AUTOENCODER and 10 RSRQ measurements), followed by four hidden layers
Autoencoder is a particular artificial neural network topology including 12, 6, 6 and 12 neurons to implement a general
which has the same input and output layers where the training topology as follows: 20-12-6-6-12-20.
is performed by presenting the same input data to both layers
simultaneously. The general structure of the auto-encoder IV. EXPERIMENTAL SETUPS
consists of a visible input layer x, a number of hidden layers h A. DATASETS
and the reconstructed output layer y with a family of nonlinear In this work, we use the dataset created by a SON simulator
activation functions f applied at different layers. previously introduced in [8] and made available to the public

61748 VOLUME 10, 2022


M. F. Kucuk, I. Uysal: Anomaly Detection in SONs: Conventional Versus Contemporary Machine Learning

FIGURE 3. The ROC curves for the four datasets for the comparative analysis of modern and conventional machine learning with the results reported in
the source paper.

for further research [23]. The dataset includes four different the collected information is anomalous. The feature vector
application scenarios where the data is collected periodically consists of the RSRP and RSRQ measurements from dif-
(with a 5 kHz sampling rate) from a minimization of drive ferent cell locations (i.e., RSRP1, RSRQ1, RSRP2, RSRQ2,
test report [24], which includes mobile user information . . . , RSRP10, RSRQ10). The dataset has 11674 observations
regarding the user activities recorded in the enclosed regions where only 60 of them are anomalous (i.e., 1 in 200) measure-
around the base stations (cells) in certain measurement peri- ment samples and 25 features including the feature vector,
ods. There are four different datasets with different numbers user ID, location, and class labels.
of users and use-cases. Fig. 2 demonstrates the basic struc- The second dataset is similar to the first one in terms of
ture of the dataset for the first use case (dataset 1), which the number of features and measurements except that it has
consists of the measurement time, a unique ID assigned to a much less frequent anomaly rate with 8382 observations,
each user, the coordinates of user locations in two dimensions where only eight of them are anomalous (i.e., approximately
at the time of measurement, the reference signal received 1 in 1000).
power (RSRP), the reference signal received quality (RSRQ), Datasets 3 and 4 represent different use cases, both of
and the label that shows if the associated entry (i.e., the collec- which have 114 features with a longer (80s) recording time,
tion of measurements associated with that user) is anomalous which resulted in a much larger observation base (42000).
(1) or not (0). RSRP and RSRQ are significant measures As in datasets 1 and 2, they represent different anomaly rates
of signal level and quality in LTE networks designated as where dataset 3 has a much larger sampling of anomalous
key performance indicators (KPIs) in identifying whether measurements (9635) compared to dataset 4 (22) because

VOLUME 10, 2022 61749


M. F. Kucuk, I. Uysal: Anomaly Detection in SONs: Conventional Versus Contemporary Machine Learning

only the entries below the -120 dB RSRP measurement


threshold were labeled as such.

B. HYPER-PARAMETERS
The most significant hyper-parameters in this study are the
oversampling rate N and the number of nearest neighbors k,
specifically for the SMOTE algorithm in applying oversam-
pling for the binary classification case. In applying SMOTE,
we tested two different sets of hyper-parameters with N =
300, k = 4 and N = 500, k = 5.

C. IMPLEMENTATION
All implementations were performed using MATLAB 2020b.
We tested both one-class and binary SVM models on all
datasets. The SMOTE algorithm was used with the binary
SVM model to adjust for the imbalanced datasets. The FIGURE 4. Area under curve (AUC) scores across different datasets.
datasets were preprocessed, where the time, UserID, and
TABLE 3. Representation of AUC scores of all datasets for one-class SVM
location features were not included in the training process for and binary SVM classifiers.
fairness. Approximately 10 % of the normal and anomalous
samples were separated for testing. Both the training and
testing samples were normalized to between (0,1].
The one-class SVM model was trained with only normal
samples using Gaussian RBF kernels. After training, we gen-
erated the SVM probability outputs with test samples, includ-
ing both normal and anomalous samples, to obtain receiver
operating characteristic (ROC) curves along with area-under-
the-curve (AUC) scores as performance metrics.
The binary SVM model is trained in exactly the same fash-
ion except that in addition to the above process, the anomaly are represented in different colors, including binary and
samples are oversampled with SMOTE to generate balanced one-class combinations with and without augmentation using
datasets prior to training and testing the algorithm. SMOTE. The results are generally consistent except in the
case of dataset 4, where all SVM combinations outperformed
D. EVALUATION METRICS the deep autoencoder at all levels of TPR & FPR. In the case
In this study, we evaluated performance by looking at ROC of the first three datasets, the deep autoencoder outperformed
curves and AUC scores. The ROC is a probability curve the SVM implementations without augmentation. However,
that shows the model’s ability to identify the positive class when SMOTE is used, both one-class and binary modalities
appropriately. It is plotted with the true positive rate (TPR) on of the SVM demonstrate higher performance compared to the
the y-axis and false positive rate (FPR) on the x-axis, where original paper, and in some cases significantly so.
TPR is the percentage of correctly classified positive outputs The results are further summarized in Fig. 4, which pro-
and FPR is the percentage of incorrectly classified positive vides an overview of AUC scores across different datasets and
outputs, as expressed below: algorithms where both binary and one-class SVMs (with the
TP FP three curves at the top) clearly outperform the deep autoen-
TPR = FPR = (5) coder approach shown in red. Table 3 provides the absolute
TP + FP FP + TN
numbers in terms of the AUC with 5% significance. The best
The AUC, on the other hand, provides a summarized number combination of SMOTE and SVM modality outperformed
as an indication of how powerful the model is in discrimi- the deep autoencoder by 19.75%, 15.5%, 15.5%, and 13%
nating between classes with the mathematical expression as for datasets 1, 2, 3, and 4, respectively. This corresponds to
below: an average performance improvement of over 15% across all
TP + TN the application scenarios. It is important to note that even
AUC = (6)
TP + FP + FN + TN without artificial augmentation of the dataset, conventional
machine learning using SVM still outperforms the previous
V. RESULTS AND DISCUSSION state-of-the-art methods reported in the literature. However,
The ROC curves for each of the four datasets are shown the difference in performance was noticeably less.
in Fig. 3. The red curve in each figure represents the latest In order to study whether the SMOTE algorithm would
state-of-the-art reported on this dataset using a deep autoen- provide a similar performance boost for the autoencoder
coder [8]. For comparison, different SVM implementations setup, we focused on the first two datasets where the

61750 VOLUME 10, 2022


M. F. Kucuk, I. Uysal: Anomaly Detection in SONs: Conventional Versus Contemporary Machine Learning

imbalance is much more significant as described in the was done using a native script whereas for the binary SVM,
Asghar et. al. [8] study. We observed that the average detec- a GUI toolbox was used with better visualization capabilities
tion accuracy with the following SMOTE hyperparameters: which affects speed. However, there should be no difference
N=300, k=4 and N=500, k=5 were 64.03% and 57% respec- in testing times since the SVM topologies are identical with
tively on dataset 1 and 68.87% and 51.64% respectively on the only difference being the way the data is represented to
dataset 2. These results are either nonsignificant improve- the algorithms.
ments over the baseline performance or in fact even worse In terms of the trainable parameters as an indicator of
compared to not applying SMOTE on the dataset. There are computational complexity, the SVM models on average had
many explanations for this but the most likely one is that the 23 support vectors compared to the 660 weight and 76 bias
operational structure of the latent space in an autoencoder parameters (total of 736) that need to be trained for the
is very similar to the way SMOTE calculates augmented autoencoder. Based on the above measurements for computa-
samples – in other words, the advantage of using SMOTE is tional times and complexity, SVM-based approaches are less
negated in the latent coding layer of the autoencoder itself. complex even when using SMOTE, have competitive AUC
It is important to discuss the nonsignificant or in some scores, and are thus more suitable for time-critical scenarios
cases negative impact of SMOTE on the performance of such as anomaly detection / outage recovery compared to
one-class SVM topology. SMOTE works by generating sam- the AE.
ples based on the nearest neighbor similarities of intra-
VI. CONCLUSION
class samples and the differences of interclass samples. This
In this study, we explored the premise of conventional
method works best when training a binary classifier where
machine learning when compared to deep learning for
one class may be less represented than another class as
anomaly detection in SONs. Anomaly detection was a pop-
evident by the performance boost observed in binary SVM
ular application area of deep learning for cell outages in
training. However, in one-class learning representation, only
communication networks. However, as in other domains,
the majority class is used in the training which would mean
conventional methods can still provide strong statistical alter-
that SMOTE could only have an indirect effect on the number
natives to the right learning representations. In this paper,
and quality of the samples generated for the normal class
we focused on SVMs with one-class and binary learning
and subsequently does not have a direct role in boosting
scenarios on a previously published and publicly available
performance. The drop in performance in some cases can
dataset. We found that while deep learning was highly com-
similarly be linked to the quality of the anomaly class samples
petitive, standard SVMs using RBF kernels, can be trained to
being generated (and thus effecting testing performance) not
outperform a deep autoencoder approach. Both one-class and
making up for the additional information which now cannot
binary classification can benefit immensely from synthetic
be used in the training process.
augmentation of the dataset using SMOTE with improve-
A. COMPUTATIONAL COMPLEXITY ANALYSIS ments in detection accuracy by as much as 15% on average
over four different application scenarios.
Computational complexity analysis is an important step in
Future work will study the impact of augmentation on
identifying the strengths and weaknesses of conventional
other learning algorithms, specifically statistical deep learn-
algorithm such as one-class and binary SVMs when com-
ing, such as variational auto-encoders. Work presented in this
pared to the more modern approaches such as the autoen-
paper can further be extended to other applications beyond
coders used in anomaly detection. In this paper we focused
anomaly or outage detection. Specifically, there has been
on both the raw computational times specifically for testing
increased attention to modulation detection in next genera-
phase for each of the four datasets scenarios as well as the
tion mobile wireless networks where fast, robust, and light
number of trainable parameters for both algorithms. We also
machine learning models could enable time-critical appli-
compared how SMOTE affected the complexity.
cations in signal classification and modulation detection.
All measurements are done using the latest version of
Improvements in speed can be realized both at the algorithm
MATLAB at the time of this writing (2021b) using the stan-
level and data preprocessing stages using techniques such as
dard time measurement scripts. On the first dataset, for the
principal component analysis to identify the most relevant
one-class SVM algorithm, the testing time took 90 ms without
features for classification and detection. Finally, statistical
SMOTE and 140 ms with SMOTE, whereas it took 9000 ms
learning algorithms, such as Gaussian Process Regression,
to run the autoencoder, almost a 100-fold increase. Additional
which have gained immense popularity as alternatives to deep
complexity of SMOTE is more than outclassed by its signif-
learning can be applied to different scenarios especially when
icant contribution to the accuracy. We have observed similar
data is not present in sufficiently large volumes to properly
computational times for the rest of the use-case scenarios
train DL models with many parameters.
(i.e., for the second dataset one-class SVM testing times were
110 - 130 ms (SMOTE) compared to 7120 ms for the AE)
REFERENCES
where there were orders of magnitude improvement in testing
[1] F. Boccardi, R. W. Heath, A. Lozano, T. L. Marzetta, and P. Popovski,
times. We have performed the computational speed analysis ‘‘Five disruptive technology directions for 5G,’’ IEEE Commun. Mag.,
using one-class SVM due to the fact that the implementation vol. 52, no. 2, pp. 74–80, Feb. 2014.

VOLUME 10, 2022 61751


M. F. Kucuk, I. Uysal: Anomaly Detection in SONs: Conventional Versus Contemporary Machine Learning

[2] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C. Soong, [21] V. Palade, ‘‘Class imbalance learning methods for support vector
and J. C. Zhang, ‘‘What will 5G be?’’ IEEE J. Sel. Areas Commun., vol. 32, machines,’’ in Imbalanced Learning: Foundations, Algorithms, and Appli-
no. 6, pp. 1065–1082, Jun. 2014. cations. Hoboken, NJ, USA: Wiley, 2013, p. 83.
[3] A.-S. Bana, E. de Carvalho, B. Soret, T. Abrão, J. C. Marinello, [22] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, ‘‘SMOTE:
E. G. Larsson, and P. Popovski, ‘‘Massive MIMO for Internet of Things Synthetic minority over-sampling technique,’’ J. Artif. Intell. Res., vol. 16,
(IoT) connectivity,’’ Phys. Commun., vol. 37, Dec. 2019, Art. no. 100859. no. 28, pp. 321–357, Jun. 2006.
[4] D. Ciuonzo, P. S. Rossi, and S. Dey, ‘‘Massive MIMO channel-aware [23] M. Z. Asghar. Zeeshan: Simulation Data Sets. Accessed: Oct. 6, 2022.
decision fusion,’’ IEEE Trans. Signal Process., vol. 63, no. 3, pp. 604–619, [Online]. Available: https://github.com/Zeeshan-bit/Sim-Dim
Feb. 2015. [24] J. Johansson, W. A. Hapsari, S. Kelley, and G. Bodog, ‘‘Minimization of
[5] Airhop. Accessed: Jun. 18, 2021. [Online]. Available: http://www. drive tests in 3GPP release 11,’’ IEEE Commun. Mag., vol. 50, no. 11,
airhopcomm.com pp. 36–43, Nov. 2012.
[6] I. de-la-Bandera, R. Barco, P. Muñoz, and I. Serrano, ‘‘Cell outage detec-
tion based on handover statistics,’’ IEEE Commun. Lett., vol. 19, no. 7,
pp. 1189–1192, Jul. 2015.
[7] Ericson. Network Outage. Accessed: Jun. 18, 2021. [Online]. Available:
http://telecoms.com/494091/
[8] M. Z. Asghar, M. Abbas, K. Zeeshan, P. Kotilainen, and T. Hämäläinen, MUHAMMED FURKAN KUCUK (Graduate Stu-
‘‘Assessment of deep learning methodology for self-organizing 5G net- dent Member, IEEE) received the B.S. degree
works,’’ Appl. Sci., vol. 9, no. 15, p. 2975, Jul. 2019. in electrical and electronics engineering from
[9] S. Rajendran, W. Meert, V. Lenders, and S. Pollin, ‘‘Unsupervised wireless Gaziantep University, Gaziantep, Turkey, in 2014,
spectrum anomaly detection with interpretable features,’’ IEEE Trans.
and the M.S. and Ph.D. degrees in electrical
Cognit. Commun. Netw., vol. 5, no. 3, pp. 637–647, Sep. 2019.
[10] J. Burgueño, I. de-la-Bandera, J. Mendoza, D. Palacios, C. Morillas, and
engineering from the University of South Florida
R. Barco, ‘‘Online anomaly detection system for mobile networks,’’ Sen- (USF), Tampa, FL, USA, in 2018 and 2022,
sors, vol. 20, no. 24, p. 7232, Dec. 2020. respectively.
[11] J. Moysen, F. Ahmed, M. Garcia-Lozano, and J. Niemela, ‘‘Unsupervised Since 2018, he has been a Research Assistant
learning for detection of mobility related anomalies in commercial LTE with USF. His research interests include machine
networks,’’ in Proc. Eur. Conf. Netw. Commun. (EuCNC), Jun. 2020, learning, deep neural networks for unsupervised learning, and ML applica-
pp. 111–115. tions in wireless communications. His award and honors include the Turkish
[12] W. Qin, Y. Teng, M. Song, Y. Zhang, and X. Wang, ‘‘AQ-learning approach Government Scholarship for M.S. and Ph.D. degrees.
for mobility robustness optimization in lte-son,’’ in Proc. 15th IEEE Int.
Conf. Commun. Technol., Nov. 2013, pp. 818–822.
[13] Z. Chen, C. K. Yeo, B. S. Lee, and C. T. Lau, ‘‘Autoencoder-based
network anomaly detection,’’ in Proc. Wireless Telecommun. Symp. (WTS),
Apr. 2018, pp. 1–5.
[14] T. O’Shea and J. Hoydis, ‘‘An introduction to deep learning for the physical ISMAIL UYSAL (Member, IEEE) received the
layer,’’ IEEE Trans. Cogn. Commun. Netw., vol. 3, no. 4, pp. 563–575, B.S. degree in electrical and electronics engi-
Dec. 2017. neering from Middle East Technical University,
[15] G. Aceto, D. Ciuonzo, A. Montieri, and A. Pescapé, ‘‘Mobile encrypted Ankara, Turkey, in 1998, and the M.S. and Ph.D.
traffic classification using deep learning: Experimental evaluation, lessons degrees in electrical and computer engineering
learned, and challenges,’’ IEEE Trans. Netw. Service Manag., vol. 16, no. 2, from the University of Florida (UF), Gainesville,
pp. 445–458, Feb. 2019. FL, USA, in 2006 and 2008, respectively.
[16] R. Kanjilal and I. Uysal, ‘‘The future of human activity recognition: Deep
From 2008 to 2010, he was a Postdoctoral
learning or feature engineering?’’ Neural Process. Lett., vol. 53, no. 1,
pp. 561–579, Feb. 2021.
Research Fellow with the Research Center for
[17] J. A. K. Suykens and J. Vandewalle, ‘‘Least squares support vector machine Food Distribution and Retailing, UF. Since 2010,
classifiers,’’ Neural Process. Lett., vol. 9, no. 3, pp. 293–300, Jun. 1999. he has been with the University of South Florida, where he is currently an
[18] V. Vapnik, The Nature of Statistical Learning Theory. New York, NY, USA: Assistant Professor of electrical engineering and the Director of the Radio
Springer, 2013. Frequency Identification (RFID) Laboratory for Applied Research, College
[19] C. Cortes and V. Vapnik, ‘‘Support-vector networks,’’ Mach. Learn., of Engineering. His research interests include deep machine learning theory
vol. 20, no. 3, pp. 273–297, 1995. and applications in semi-supervised and unsupervised settings, data-oriented
[20] B. E. Boser, I. M. Guyon, and V. N. Vapnik, ‘‘A training algorithm for applications of RFID systems in healthcare and food supply chains, and
optimal margin classifiers,’’ in Proc. 5th Annu. Workshop Comput. Learn. signal processing algorithms for brain–computer interfaces.
Theory (COLT), 1992, pp. 144–152.

61752 VOLUME 10, 2022

You might also like