Anomaly Detection in Self-Organizing Networks - Conventional Versus Contemporary Machine Learning
Anomaly Detection in Self-Organizing Networks - Conventional Versus Contemporary Machine Learning
ABSTRACT This paper presents a comparison of conventional and modern machine (deep) learning
within the framework of anomaly detection in self-organizing networks. While deep learning has gained
significant traction, especially in application scenarios where large volumes of data can be collected and
processed, conventional methods may yet offer strong statistical alternatives, especially when using proper
learning representations. For instance, support vector machines have previously demonstrated state-of-
the-art potential in many binary classification applications and can be further exploited with different
representations, such as one-class learning and data augmentation. We demonstrate for the first time, on a
previously published and publicly available dataset, that conventional machine learning can outperform the
previous state-of-the-art using deep learning by 15% on average across four different application scenarios.
Our results further indicate that with nearly two orders of magnitude improvement in computational speed
and an order of magnitude reduction in trainable parameters, conventional machine learning provides a robust
alternative for 5G self-organizing networks especially when the execution and detection times are critical.
INDEX TERMS Anomaly detection, deep learning, machine learning, mobile network communications,
self organizing networks.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
61744 VOLUME 10, 2022
M. F. Kucuk, I. Uysal: Anomaly Detection in SONs: Conventional Versus Contemporary Machine Learning
of a network outage [7], which could have been prevented TABLE 1. Abbreviations used in this study.
using AI-powered SONs. In consideration of these issues,
the European Telecommunications Standards Institute (ETSI)
introduced a zero-touch group to research automation to
advance machine learning and AI techniques (deep learning)
specifically for anomaly detection applications on mobile
networks.
This paper has four main contributions to the field as
summarized below:
• We present a comprehensive analysis of a conventional
machine learning method for anomaly detection in self-
organizing 5G networks (5G-SONs) and compare it
with a popular deep learning alternative using different
learning representations, including one-class and binary
learning.
• We claim state-of-the-art performance on a publicly
available dataset [8], which investigates multiple use
case scenarios for anomaly detection in 5G-SONs.
We demonstrate an average improvement of 15% over
the best recent performance which was achieved by a
deep auto-encoder-based setup.
• We demonstrate for the first time that data augmentation
methods can further boost anomaly detection perfor-
mance in binary mode, even when utilizing conventional
algorithmic methods such as support vector machines on
a sufficiently large dataset.
• Finally, we achieve nearly two orders of magnitude
improvement in computational speed and an order of
magnitude reduction in trainable parameters using con-
ventional machine learning to provide a robust alterna-
tive for 5G self-organizing networks especially when the In another example, [6] utilizes the measurements and han-
execution and detection times are critical. dover statistics (inHO) from adjacent cells in a mobile com-
munications network to expose abnormalities and outages.
The rest of this paper is organized as follows. Section II Monitoring in this way provides the potential status of a cell
provides a brief summary of prior work on anomaly detection outage where the inHO information becomes zero. In [10],
in current mobile and self-organizing networks. Section III a novel online anomaly detection system was proposed in
introduces the methods used in this study. Section IV mobile networks to identify anomalies in key performance
describes the experimental setup in detail and provides indicators (KPIs). The proposed system consists of a training
the hyper-parameters, dataset characteristics, implementa- and detection/tracking block. The system learns the most
tion, evaluation metrics, and necessary details for repeata- detrimental anomalies in the training block as each recent
bility. Finally, the results and discussions are presented in KPI is sourced, and monitors its status until the end of the
Section V, followed by the conclusions in Section VI. The second block. Thus, the detection of anomalies has been set
abbreviations used in this paper have been provided in to prefer highly possible anomalies in the long term. More-
Table 1 for easy reference. over, the system objects to provide the minimum amount of
anomalies by maintaining a low positive rate on behalf of
II. RELATED RESEARCH network operators’ efforts to deal with only real anomalies.
Anomaly detection in communications has been an active In addition, the system can be extended to next-generation
research area over the last decade. For instance, in [9], abnor- networks via automatic adaptation features to a new network
mal activity in the wireless spectrum has been explored. behavior profile.
Specifically, the authors used power spectral density (PSD) The authors in [11] proposed unsupervised learning
information to identify and pinpoint anomalies in the form to detect anomalies related to mobility using mobility
of either undesired signals present in the licensed band or robustness optimization (MRO), which is an important
the absence of a desired signal. The information obtained use case of SONs for modern 4G and 5G networks.
from the PSD was processed using a combination of adver- A similar study in [12] brings a different perspective
sarial auto-encoders, convolutional neural networks, and long by using reinforcement learning to degrade call drop
short-term memory recurrent neural networks. rates.
FIGURE 3. The ROC curves for the four datasets for the comparative analysis of modern and conventional machine learning with the results reported in
the source paper.
for further research [23]. The dataset includes four different the collected information is anomalous. The feature vector
application scenarios where the data is collected periodically consists of the RSRP and RSRQ measurements from dif-
(with a 5 kHz sampling rate) from a minimization of drive ferent cell locations (i.e., RSRP1, RSRQ1, RSRP2, RSRQ2,
test report [24], which includes mobile user information . . . , RSRP10, RSRQ10). The dataset has 11674 observations
regarding the user activities recorded in the enclosed regions where only 60 of them are anomalous (i.e., 1 in 200) measure-
around the base stations (cells) in certain measurement peri- ment samples and 25 features including the feature vector,
ods. There are four different datasets with different numbers user ID, location, and class labels.
of users and use-cases. Fig. 2 demonstrates the basic struc- The second dataset is similar to the first one in terms of
ture of the dataset for the first use case (dataset 1), which the number of features and measurements except that it has
consists of the measurement time, a unique ID assigned to a much less frequent anomaly rate with 8382 observations,
each user, the coordinates of user locations in two dimensions where only eight of them are anomalous (i.e., approximately
at the time of measurement, the reference signal received 1 in 1000).
power (RSRP), the reference signal received quality (RSRQ), Datasets 3 and 4 represent different use cases, both of
and the label that shows if the associated entry (i.e., the collec- which have 114 features with a longer (80s) recording time,
tion of measurements associated with that user) is anomalous which resulted in a much larger observation base (42000).
(1) or not (0). RSRP and RSRQ are significant measures As in datasets 1 and 2, they represent different anomaly rates
of signal level and quality in LTE networks designated as where dataset 3 has a much larger sampling of anomalous
key performance indicators (KPIs) in identifying whether measurements (9635) compared to dataset 4 (22) because
B. HYPER-PARAMETERS
The most significant hyper-parameters in this study are the
oversampling rate N and the number of nearest neighbors k,
specifically for the SMOTE algorithm in applying oversam-
pling for the binary classification case. In applying SMOTE,
we tested two different sets of hyper-parameters with N =
300, k = 4 and N = 500, k = 5.
C. IMPLEMENTATION
All implementations were performed using MATLAB 2020b.
We tested both one-class and binary SVM models on all
datasets. The SMOTE algorithm was used with the binary
SVM model to adjust for the imbalanced datasets. The FIGURE 4. Area under curve (AUC) scores across different datasets.
datasets were preprocessed, where the time, UserID, and
TABLE 3. Representation of AUC scores of all datasets for one-class SVM
location features were not included in the training process for and binary SVM classifiers.
fairness. Approximately 10 % of the normal and anomalous
samples were separated for testing. Both the training and
testing samples were normalized to between (0,1].
The one-class SVM model was trained with only normal
samples using Gaussian RBF kernels. After training, we gen-
erated the SVM probability outputs with test samples, includ-
ing both normal and anomalous samples, to obtain receiver
operating characteristic (ROC) curves along with area-under-
the-curve (AUC) scores as performance metrics.
The binary SVM model is trained in exactly the same fash-
ion except that in addition to the above process, the anomaly are represented in different colors, including binary and
samples are oversampled with SMOTE to generate balanced one-class combinations with and without augmentation using
datasets prior to training and testing the algorithm. SMOTE. The results are generally consistent except in the
case of dataset 4, where all SVM combinations outperformed
D. EVALUATION METRICS the deep autoencoder at all levels of TPR & FPR. In the case
In this study, we evaluated performance by looking at ROC of the first three datasets, the deep autoencoder outperformed
curves and AUC scores. The ROC is a probability curve the SVM implementations without augmentation. However,
that shows the model’s ability to identify the positive class when SMOTE is used, both one-class and binary modalities
appropriately. It is plotted with the true positive rate (TPR) on of the SVM demonstrate higher performance compared to the
the y-axis and false positive rate (FPR) on the x-axis, where original paper, and in some cases significantly so.
TPR is the percentage of correctly classified positive outputs The results are further summarized in Fig. 4, which pro-
and FPR is the percentage of incorrectly classified positive vides an overview of AUC scores across different datasets and
outputs, as expressed below: algorithms where both binary and one-class SVMs (with the
TP FP three curves at the top) clearly outperform the deep autoen-
TPR = FPR = (5) coder approach shown in red. Table 3 provides the absolute
TP + FP FP + TN
numbers in terms of the AUC with 5% significance. The best
The AUC, on the other hand, provides a summarized number combination of SMOTE and SVM modality outperformed
as an indication of how powerful the model is in discrimi- the deep autoencoder by 19.75%, 15.5%, 15.5%, and 13%
nating between classes with the mathematical expression as for datasets 1, 2, 3, and 4, respectively. This corresponds to
below: an average performance improvement of over 15% across all
TP + TN the application scenarios. It is important to note that even
AUC = (6)
TP + FP + FN + TN without artificial augmentation of the dataset, conventional
machine learning using SVM still outperforms the previous
V. RESULTS AND DISCUSSION state-of-the-art methods reported in the literature. However,
The ROC curves for each of the four datasets are shown the difference in performance was noticeably less.
in Fig. 3. The red curve in each figure represents the latest In order to study whether the SMOTE algorithm would
state-of-the-art reported on this dataset using a deep autoen- provide a similar performance boost for the autoencoder
coder [8]. For comparison, different SVM implementations setup, we focused on the first two datasets where the
imbalance is much more significant as described in the was done using a native script whereas for the binary SVM,
Asghar et. al. [8] study. We observed that the average detec- a GUI toolbox was used with better visualization capabilities
tion accuracy with the following SMOTE hyperparameters: which affects speed. However, there should be no difference
N=300, k=4 and N=500, k=5 were 64.03% and 57% respec- in testing times since the SVM topologies are identical with
tively on dataset 1 and 68.87% and 51.64% respectively on the only difference being the way the data is represented to
dataset 2. These results are either nonsignificant improve- the algorithms.
ments over the baseline performance or in fact even worse In terms of the trainable parameters as an indicator of
compared to not applying SMOTE on the dataset. There are computational complexity, the SVM models on average had
many explanations for this but the most likely one is that the 23 support vectors compared to the 660 weight and 76 bias
operational structure of the latent space in an autoencoder parameters (total of 736) that need to be trained for the
is very similar to the way SMOTE calculates augmented autoencoder. Based on the above measurements for computa-
samples – in other words, the advantage of using SMOTE is tional times and complexity, SVM-based approaches are less
negated in the latent coding layer of the autoencoder itself. complex even when using SMOTE, have competitive AUC
It is important to discuss the nonsignificant or in some scores, and are thus more suitable for time-critical scenarios
cases negative impact of SMOTE on the performance of such as anomaly detection / outage recovery compared to
one-class SVM topology. SMOTE works by generating sam- the AE.
ples based on the nearest neighbor similarities of intra-
VI. CONCLUSION
class samples and the differences of interclass samples. This
In this study, we explored the premise of conventional
method works best when training a binary classifier where
machine learning when compared to deep learning for
one class may be less represented than another class as
anomaly detection in SONs. Anomaly detection was a pop-
evident by the performance boost observed in binary SVM
ular application area of deep learning for cell outages in
training. However, in one-class learning representation, only
communication networks. However, as in other domains,
the majority class is used in the training which would mean
conventional methods can still provide strong statistical alter-
that SMOTE could only have an indirect effect on the number
natives to the right learning representations. In this paper,
and quality of the samples generated for the normal class
we focused on SVMs with one-class and binary learning
and subsequently does not have a direct role in boosting
scenarios on a previously published and publicly available
performance. The drop in performance in some cases can
dataset. We found that while deep learning was highly com-
similarly be linked to the quality of the anomaly class samples
petitive, standard SVMs using RBF kernels, can be trained to
being generated (and thus effecting testing performance) not
outperform a deep autoencoder approach. Both one-class and
making up for the additional information which now cannot
binary classification can benefit immensely from synthetic
be used in the training process.
augmentation of the dataset using SMOTE with improve-
A. COMPUTATIONAL COMPLEXITY ANALYSIS ments in detection accuracy by as much as 15% on average
over four different application scenarios.
Computational complexity analysis is an important step in
Future work will study the impact of augmentation on
identifying the strengths and weaknesses of conventional
other learning algorithms, specifically statistical deep learn-
algorithm such as one-class and binary SVMs when com-
ing, such as variational auto-encoders. Work presented in this
pared to the more modern approaches such as the autoen-
paper can further be extended to other applications beyond
coders used in anomaly detection. In this paper we focused
anomaly or outage detection. Specifically, there has been
on both the raw computational times specifically for testing
increased attention to modulation detection in next genera-
phase for each of the four datasets scenarios as well as the
tion mobile wireless networks where fast, robust, and light
number of trainable parameters for both algorithms. We also
machine learning models could enable time-critical appli-
compared how SMOTE affected the complexity.
cations in signal classification and modulation detection.
All measurements are done using the latest version of
Improvements in speed can be realized both at the algorithm
MATLAB at the time of this writing (2021b) using the stan-
level and data preprocessing stages using techniques such as
dard time measurement scripts. On the first dataset, for the
principal component analysis to identify the most relevant
one-class SVM algorithm, the testing time took 90 ms without
features for classification and detection. Finally, statistical
SMOTE and 140 ms with SMOTE, whereas it took 9000 ms
learning algorithms, such as Gaussian Process Regression,
to run the autoencoder, almost a 100-fold increase. Additional
which have gained immense popularity as alternatives to deep
complexity of SMOTE is more than outclassed by its signif-
learning can be applied to different scenarios especially when
icant contribution to the accuracy. We have observed similar
data is not present in sufficiently large volumes to properly
computational times for the rest of the use-case scenarios
train DL models with many parameters.
(i.e., for the second dataset one-class SVM testing times were
110 - 130 ms (SMOTE) compared to 7120 ms for the AE)
REFERENCES
where there were orders of magnitude improvement in testing
[1] F. Boccardi, R. W. Heath, A. Lozano, T. L. Marzetta, and P. Popovski,
times. We have performed the computational speed analysis ‘‘Five disruptive technology directions for 5G,’’ IEEE Commun. Mag.,
using one-class SVM due to the fact that the implementation vol. 52, no. 2, pp. 74–80, Feb. 2014.
[2] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C. Soong, [21] V. Palade, ‘‘Class imbalance learning methods for support vector
and J. C. Zhang, ‘‘What will 5G be?’’ IEEE J. Sel. Areas Commun., vol. 32, machines,’’ in Imbalanced Learning: Foundations, Algorithms, and Appli-
no. 6, pp. 1065–1082, Jun. 2014. cations. Hoboken, NJ, USA: Wiley, 2013, p. 83.
[3] A.-S. Bana, E. de Carvalho, B. Soret, T. Abrão, J. C. Marinello, [22] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, ‘‘SMOTE:
E. G. Larsson, and P. Popovski, ‘‘Massive MIMO for Internet of Things Synthetic minority over-sampling technique,’’ J. Artif. Intell. Res., vol. 16,
(IoT) connectivity,’’ Phys. Commun., vol. 37, Dec. 2019, Art. no. 100859. no. 28, pp. 321–357, Jun. 2006.
[4] D. Ciuonzo, P. S. Rossi, and S. Dey, ‘‘Massive MIMO channel-aware [23] M. Z. Asghar. Zeeshan: Simulation Data Sets. Accessed: Oct. 6, 2022.
decision fusion,’’ IEEE Trans. Signal Process., vol. 63, no. 3, pp. 604–619, [Online]. Available: https://github.com/Zeeshan-bit/Sim-Dim
Feb. 2015. [24] J. Johansson, W. A. Hapsari, S. Kelley, and G. Bodog, ‘‘Minimization of
[5] Airhop. Accessed: Jun. 18, 2021. [Online]. Available: http://www. drive tests in 3GPP release 11,’’ IEEE Commun. Mag., vol. 50, no. 11,
airhopcomm.com pp. 36–43, Nov. 2012.
[6] I. de-la-Bandera, R. Barco, P. Muñoz, and I. Serrano, ‘‘Cell outage detec-
tion based on handover statistics,’’ IEEE Commun. Lett., vol. 19, no. 7,
pp. 1189–1192, Jul. 2015.
[7] Ericson. Network Outage. Accessed: Jun. 18, 2021. [Online]. Available:
http://telecoms.com/494091/
[8] M. Z. Asghar, M. Abbas, K. Zeeshan, P. Kotilainen, and T. Hämäläinen, MUHAMMED FURKAN KUCUK (Graduate Stu-
‘‘Assessment of deep learning methodology for self-organizing 5G net- dent Member, IEEE) received the B.S. degree
works,’’ Appl. Sci., vol. 9, no. 15, p. 2975, Jul. 2019. in electrical and electronics engineering from
[9] S. Rajendran, W. Meert, V. Lenders, and S. Pollin, ‘‘Unsupervised wireless Gaziantep University, Gaziantep, Turkey, in 2014,
spectrum anomaly detection with interpretable features,’’ IEEE Trans.
and the M.S. and Ph.D. degrees in electrical
Cognit. Commun. Netw., vol. 5, no. 3, pp. 637–647, Sep. 2019.
[10] J. Burgueño, I. de-la-Bandera, J. Mendoza, D. Palacios, C. Morillas, and
engineering from the University of South Florida
R. Barco, ‘‘Online anomaly detection system for mobile networks,’’ Sen- (USF), Tampa, FL, USA, in 2018 and 2022,
sors, vol. 20, no. 24, p. 7232, Dec. 2020. respectively.
[11] J. Moysen, F. Ahmed, M. Garcia-Lozano, and J. Niemela, ‘‘Unsupervised Since 2018, he has been a Research Assistant
learning for detection of mobility related anomalies in commercial LTE with USF. His research interests include machine
networks,’’ in Proc. Eur. Conf. Netw. Commun. (EuCNC), Jun. 2020, learning, deep neural networks for unsupervised learning, and ML applica-
pp. 111–115. tions in wireless communications. His award and honors include the Turkish
[12] W. Qin, Y. Teng, M. Song, Y. Zhang, and X. Wang, ‘‘AQ-learning approach Government Scholarship for M.S. and Ph.D. degrees.
for mobility robustness optimization in lte-son,’’ in Proc. 15th IEEE Int.
Conf. Commun. Technol., Nov. 2013, pp. 818–822.
[13] Z. Chen, C. K. Yeo, B. S. Lee, and C. T. Lau, ‘‘Autoencoder-based
network anomaly detection,’’ in Proc. Wireless Telecommun. Symp. (WTS),
Apr. 2018, pp. 1–5.
[14] T. O’Shea and J. Hoydis, ‘‘An introduction to deep learning for the physical ISMAIL UYSAL (Member, IEEE) received the
layer,’’ IEEE Trans. Cogn. Commun. Netw., vol. 3, no. 4, pp. 563–575, B.S. degree in electrical and electronics engi-
Dec. 2017. neering from Middle East Technical University,
[15] G. Aceto, D. Ciuonzo, A. Montieri, and A. Pescapé, ‘‘Mobile encrypted Ankara, Turkey, in 1998, and the M.S. and Ph.D.
traffic classification using deep learning: Experimental evaluation, lessons degrees in electrical and computer engineering
learned, and challenges,’’ IEEE Trans. Netw. Service Manag., vol. 16, no. 2, from the University of Florida (UF), Gainesville,
pp. 445–458, Feb. 2019. FL, USA, in 2006 and 2008, respectively.
[16] R. Kanjilal and I. Uysal, ‘‘The future of human activity recognition: Deep
From 2008 to 2010, he was a Postdoctoral
learning or feature engineering?’’ Neural Process. Lett., vol. 53, no. 1,
pp. 561–579, Feb. 2021.
Research Fellow with the Research Center for
[17] J. A. K. Suykens and J. Vandewalle, ‘‘Least squares support vector machine Food Distribution and Retailing, UF. Since 2010,
classifiers,’’ Neural Process. Lett., vol. 9, no. 3, pp. 293–300, Jun. 1999. he has been with the University of South Florida, where he is currently an
[18] V. Vapnik, The Nature of Statistical Learning Theory. New York, NY, USA: Assistant Professor of electrical engineering and the Director of the Radio
Springer, 2013. Frequency Identification (RFID) Laboratory for Applied Research, College
[19] C. Cortes and V. Vapnik, ‘‘Support-vector networks,’’ Mach. Learn., of Engineering. His research interests include deep machine learning theory
vol. 20, no. 3, pp. 273–297, 1995. and applications in semi-supervised and unsupervised settings, data-oriented
[20] B. E. Boser, I. M. Guyon, and V. N. Vapnik, ‘‘A training algorithm for applications of RFID systems in healthcare and food supply chains, and
optimal margin classifiers,’’ in Proc. 5th Annu. Workshop Comput. Learn. signal processing algorithms for brain–computer interfaces.
Theory (COLT), 1992, pp. 144–152.