Machine Learning Appledore Oracle WP
Machine Learning Appledore Oracle WP
March 2022
Machine Learning
Application in Telecommunication networks to
improve anomaly detection
Author: Patrick Kelly
In partnership with
Machine Learning - Application in Telecommunication networks to improve anomaly detection
© Appledore Research LLC 2022. All rights reserved. No part of this publication may be
reproduced, stored in a retrieval system or transmitted in any form or by any means –
electronic, mechanical, photocopying, recording or otherwise – without the prior written
permission of the publisher.
Figures and projections contained in this report are based on publicly available information
only and are produced by the Research Division of Appledore Research LLC independently of
any client-specific work within Appledore Research LLC. The opinions expressed are those of the
stated authors only.
Appledore Research LLC recognizes that many terms appearing in this report are proprietary; all
such trademarks are acknowledged, and every effort has been made to indicate them by the
normal USA publishing standards. However, the presence of a term, in whatever form, does not
affect its legal status as a trademark.
Appledore Research LLC maintains that all reasonable care and skill have been used in the
compilation of this publication. However, Appledore Research LLC shall not be under any
liability for loss or damage (including consequential loss) whatsoever or howsoever arising
because of the use of this publication by the customer, his servants, agents or any third party.
EXECUTIVE SUMMARY
Machine learning promises to solve complex problems in the telecommunication operational
domain much faster, cheaper, and with more accuracy than the best human domain experts
could ever hope to achieve. Machine learning applied correctly has the potential to predict,
identify, and quickly isolate anomalies yielding positive business outcomes. This is possible now
because of high quality data sets, advances in compute processing, and teaming telco domain
experts with data scientist to develop robust machine learning algorithms. The best way to
think about machine learning is that it minimizes a cost function such as reducing an error rate.
Said another way it processes millions of data sets, searches for patterns, and the results have
a higher rate of accuracy compared to traditional methods.
The modern telecommunication network operations center processes billions of events daily.
The tools to help operators are a collection of domain specific fault, performance, and event
reduction consoles that require experts to understand the context of alarms and interpret
potential service impacting events. In most cases service impacting events that violate service
level agreements and cause widespread network outages occur because of the complexity for
the best NOC experts to identify, isolate, and detect with accuracy the root cause of rare events.
The field of machine learning (ML) applied to the telecommunication network is in its infancy.
Up to now most ML use cases have been applied to voice assist bots in the customer care area.
ML applied to the NOC has the potential to improve customer satisfaction by reducing MTTR and
lower operational cost by offloading outlier detection to machines not humans.
TRADITIONAL METHODS
Most service assurance systems deployed over the past decade have utilized two primary
techniques to control alarm storms. The first method was a rules-based correlation approach
which programmed specific parameters into the source code using If/Then/Else logic.
Improvements to the first method evolved using relationship modeling in the topology. Classic
examples of this approach applied signature patterns that were compared to masters in the
code. Both methods were effective but still required expertise in the NOC to isolate faults and
service impacts that the code missed. This also requires constant maintenance and additions
with an army of highly skilled resource.
Figure 1 provides a illustrative view of how much better machines can perform against the best
human experts. As networks become more complex data grows and human experts become
overwhelmed using traditional tools. Machine learning is capable of processing more data in
shorter cycle times with higher degrees of accuracy.
MACHINE LEARNING
We are now entering the era of machine learning (ML). ML can be applied effectively today
because processing power has improved through advances in graphic processing units, data is
widely available as sensors become ubiquitous, and the field of data science applied to specific
industries can yield higher positive outcomes at a fraction of the cost compared to human
experts.
The method of applying machine learning modeling is ideal to satisfy and provide some degree
of confidence in the network for identifying potentially new faults compared to traditional
methods. The most common way to apply this method in production environments is to look at
each data set from the sensor and to impose a minimum and maximum value limit on it. If the
current value is within the range, then the network is healthy. If the current value is outside the
range, then the network is not running at optimal capacity and it should trigger an event.
Applying machine learning in optical network domain: A common occurrence that is difficult to
detect is optical fiber insertion loss or the measurement of light that is lost between two fixed
points in the fiber trunk. This often times affects packet retransmission and delay which can
further impact service quality for video, broadband business applications, and voice. Many
times, this is a result of bending of the optical fiber, cracks in the glass, or blunt force trauma
from excavation machinery. The attenuation of the fiber optic cable can be difficult to isolate in
most situations resulting in wasted truck rolls and time-consuming troubleshooting processes.
The ML process method is to first prepare the data. The ML algorithms can be identified to
achieve the best results and a subset of historical data is used to train the model(s). A part of
this process is to identify the key performance indicators provided by the infrastructure and
establish the data sets to run on clean and crushed fiber links for training data. The model is
tested, and accuracy checked after many iterations. The results should improve with each
successive occurrence. Often times techniques such as back propagation will be necessary
where weightings are adjusted for inputs into the model to improve its accuracy. It can then be
deployed against live data.
BUSINESS OUTCOME
Oracle has developed a robust model in the above used case that provides out of the box
machine learning solutions to help CSPs resolve outlier events faster and with more accuracy
than current operational processes. Using proven modelling techniques anomalies can be
detected and isolated on optical fiber links in near real time reducing SLA penalties.
Customers of Oracle Unified Assurance have been able to identify whether data points are
normal vs anomalous. Oracle looks at hundreds of variables and compares the data set to
known learnt conditions. As some variables exceed thresholds the model of the condition
begins to understand potential anomalous conditions. This conditional monitoring is the result
of sourcing data from many sub-domain managers and connected network resources in the
operator’s network environment. Min and max values can be set. If any value is outside the
min/max, the model detects the anomaly and begins to analyze it against the combination of
KPI values in the model (potential fiber crush).
RECOMMENDATIONS
We expect that the future of machine learning and it use in the telecommunication operators
business will proliferate over the next several years. This will be driven by more use cases that
prove to be more cost effective and achieve better results than current methods. The biggest
barrier to adoption of ML based solutions is cultural beliefs not the technology itself. CSPs
considering applying the use of ML technology should consider working closely with suppliers
that have proved out similar use cases. The benefits of a collaborative approach is access to
domain expertise, avoiding wasted time in identifying useful data sets, and faster
implementation of ML based models that have been proven out in commercial deployments.
www.appledoreresearch.com