Presentation 8

Self-healing in mobile
networks with big data

Presented by:
Md shakeer
21951A1244
IT-A
Abstract
 For enabling automatic deployment and management of cellular networks, the

concept of self-organizing network (SON) was introduced.
 SON capabilities can enhance network performance, improve service quality, and
reduce operational and capital expenditure (OPEX/CAPEX).
 As an important component in SON, self-healing is defined as a paradigm where the
faults of target networks are mitigated or recovered by automatically triggering a
series of actions such as detection, diagnosis, and compensation.
 Data-driven machine learning has been recognized as a powerful tool to bring
intelligence into networks and to realize self-healing.
 However, there are major challenges for practical applications of machine learning
techniques for self-healing.
 In this article, we first classify these challenges into five categories: (1) data
imbalance, (2) data insufficiency, (3) cost insensitivity, (4) non-real-time response,
and (5) multisource data fusion.
 Then, we provide potential technical solutions to address these challenges.
 Furthermore, a case study of cost-sensitive fault detection with imbalanced data is

provided to illustrate the feasibility and effectiveness of the suggested solutions.
introduction
 With the development of cellular networks towards 5G and

beyond, they are evolving to more complex structures featured by
heterogeneity and dense deployment.
 In these networks, traditional (e.g., nonautomated) methods for

network deployment, configuration, optimization, and
maintenance will be inefficient and will incur huge operational and
maintaining expenditures.
 This has led to the concept of self-organizing network (SON)

advocated by the Third Generation Partnership Project (3GPP) and
Next Generation Mobile Networks (NGMN) alliance.
 SON includes three main functions: self-configuration, self-optimization, and self-
healing.
 SON capabilities will enable more flexible planning and deployment of mobile
networks, more efficient optimization and maintenance, less manual intervention,
and lower capital expenditure (CAPEX) and operational expenditure (OPEX) [1].
 As an important SON functionality, self-healing will automatically detect faults of
target networks (e.g., cellular networks) and trigger corresponding actions to fix
them.
 The self-healing functionality mainly includes four phases: fault detection, diagnosis,
compensation, and recovery.
 The goal of fault detection is to find problems such as unacceptable service quality
(e.g., due to coverage hole, excessive interference, and excessive antenna uptilt or
downtilt).
 Fault diagnosis identifies the root cause based on key performance indicators (KPIs)
and alarms.
 In traditional networks, it is common that operators are aware of service failures
only after receiving a large number of user complaints.
 And for failure recovery, the experience of technicians is of paramount importance.
 In comparison, the objective of self-healing is to perform these tasks automatically
in an active manner.
 Naturally, the introduction of intelligence into networks is required, for which
machine learning has been recognized as a powerful tool.
 Specifically, machine learning techniques are able to automatically generate
inference and classification models by training collected data, offering accurate
results for reliable decision making.
 Different types of machine learning techniques (e.g., supervised learning,

unsupervised learning, and reinforcement learning) have been leveraged for self-
healing.
 For example, many learning algorithms are devised to detect cell outage and to
compensate degraded network performance of problematic cells.
Materials and Methods
Challenges in Data-Driven Machine Learning

Although machine learning technologies facilitate the development of self-healing
methods for cellular networks, several major challenges exist which can impact the
performance and practical implementations. In this article, we classify these
challenges into the following five categories:
(i)Data Imbalance. In cellular networks, due to the occurrence of rare events (e.g.,
network failure), the collected data sets are usually imbalanced.
These imbalanced data can significantly impact the performance of classifiers, which
is likely to have a skew towards the majority class. However, existing schemes rarely
take the issue of data imbalance into account
(ii)Data Insufficiency. The insufficiency of high-quality data can result in severe
overfitting of learning models (e.g., classifier). Firstly, the data set obtained from high-
fidelity network simulators may not fully represent the measurements in practical
cellular networks. While the real data from network operators (e.g., log data) may not
be well organized and labeled, it is difficult to extract effective information and build
knowledge from these data
(iii)Cost Insensitivity. Most of the existing schemes pursue a low detection error
rate, while ignoring the fact that different types of misclassification errors can cause
different losses to the operators. In such case, considering accuracy as the only
evaluation criterion is defective and cost sensitivity should be considered
(iv)Non-Real-Time Response. Most of the existing self-healing schemes do not meet
the real-time response requirements due to their reactive characteristics. Specifically,
they are mainly based on postoperations (e.g., diagnosing after malfunctions occur).
Designing proactive schemes to reduce the delay and enable real-time response is
challenging
(v)Multisource Data Fusion. Theoretically, data from varying levels such as

subscriber level, cell level, and core network level can be jointly exploited for achieving
better performance [2]. However, the multisource data bring difficulties to model
construction. Therefore, performing multisource data fusion for self-healing is a
challenging issue
Data imbalance
 Data imbalance often occurs in machine learning and data mining, when at least
one class contains more samples compared to other classes.
 For convenience, we term the class containing numerous samples as the majority
class, and the class including a relatively small number of samples as the minority
class.
 The ratio of the number of samples between the minority and majority classes is
used to measure the degree of data imbalance. In general, when this ratio is close
to 1, the data imbalance can be negligible. On the other hand, when the ratio is
significantly less than 1, the imbalance may hamper the performance of classifiers
significantly.
 In self-healing, fault detection and diagnosis can be considered typical classification
problems. Accordingly, existing machine learning-based classification methods can
be applied for which measurement data during networking operation period are
collected to train corresponding classifiers.
Data Insufficiency
 Data insufficiency arises mainly due to the following reasons. First, for most
researchers in universities and research institutions, acquiring sufficient data from
network operators is not an easy task due to privacy and business issues.
 In the existing literature on self-healing, most of the works use data from some
high-fidelity network simulators (e.g., NS3, Vienna-LTE, and LTE-Sim).
 Though these simulators provide a good simulation environment, the data collected
via simulations cannot fully represent real network scenarios.
 Also, mobile network measurement data may be collected by means of third-party
sniffers or some applications in mobile devices, some measurement data (e.g., fault
indicating data) are difficult to collect.
 Second, network operators have a huge amount of operation data which are stored
in system logs. However, these data may not be well organized and labeled.
Cost Insensitivity
 In order to evaluate the performance of a machine learning method, metrics such as

accuracy, generalization ability, interpretability, time, and space complexity, as well
as cost sensitivity need to be taken into account.
 However, the traditional machine learning methods for self-healing focus primarily
on maximizing accuracy, and they ignore the cost involved in the classification
process (i.e., assume equal costs for different misclassification errors).
 In real-world scenarios, different misclassification errors often have varying costs.
For example, within the process of cell failure diagnosis in self-healing, the cost of
mistakenly diagnosing a malfunction as a fault-free case is larger than that of
identifying a fault-free case as a case of malfunctioning.
 Detecting a fault-free case as a malfunction at least can attract the attention of
engineers and make them take actions to check the failure.
Solutions for Data Imbalance
(1) Data Preprocessing. It is aimed at converting imbalanced data to balanced ones

through changing the distribution of target data sets before they are fed to the
machine learning algorithms. The common preprocessing methods are
undersampling and oversampling, which change the distribution of training samples.
Specifically, undersampling is used to remove several majority class samples
randomly and oversampling is used to duplicate the minority class samples till a
balanced data set is produced. However, undersampling may result in some
important information in the majority classes being lost, and oversampling may
result in overfitting due to the duplicating operations of the minority class samples [3
].
Solutions for Data Insufficiency
1) Data Preprocessing. The issue of data insufficiency can still be addressed through
generating more data artificially. In this context, some methods used to tackle the
problems of data imbalance such as random oversampling and SMOTE and its variants
are suitable to cope with the data insufficiency problem.
(2) Algorithm Modification. In the algorithm level, the common solutions are to
combine data preprocessing with existing machine learning algorithms. Besides, the
concept of transfer learning, which is based on the idea of acquiring knowledge from
one problem/field (source domain) and adopting them to the learning tasks for a new
problem/area (target domain), can be a promising solution approach to overcome the
problem of data insufficiency [5]. In self-healing, some learning tasks are similar to the
ones in other networks. For example, there could be enough data available from
industrial or wireless sensor networks related to the tasks such as error recovery and
intrusion detection.
Result
We use the simulation scenario proposed in [11]. We only consider the binary
classification problem in this article. An imbalanced data set is utilized, and there are
117 fault data and 3,363 fault-free data in classes 0 and 1, respectively. We split the
entire data into a training set (including 2,783 data) and a testing set (including 696
data), and each data is composed of seven key performance indicators (KPIs):
retainability, handover success rate, reference signal received power (RSRP),
reference signal received quality (RSRQ), Signal-to-interference-plus-noise ratio
(SINR), throughput, and distance. For performance evaluation, we show the results
through ROC curves and use the area under the ROC curve (AUC) to compare
different classification algorithms. The larger the AUC, the better the classification
performance.

Presentation 8

Uploaded by

Copyright:

Available Formats

Presentation 8

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Presentation 8

Uploaded by

Copyright:

Available Formats

Self-healing in mobile

networks with big data

 For enabling automatic deployment and management of cellular networks, the

 Then, we provide potential technical solutions to address these challenges.

 Furthermore, a case study of cost-sensitive fault detection with imbalanced data is

 With the development of cellular networks towards 5G and

 In these networks, traditional (e.g., nonautomated) methods for

 This has led to the concept of self-organizing network (SON)

 Different types of machine learning techniques (e.g., supervised learning,

Challenges in Data-Driven Machine Learning

(v)Multisource Data Fusion. Theoretically, data from varying levels such as

 In order to evaluate the performance of a machine learning method, metrics such as

(1) Data Preprocessing. It is aimed at converting imbalanced data to balanced ones

You might also like