Presentation 8
Presentation 8
Presentation 8
For example, many learning algorithms are devised to detect cell outage and to
compensate degraded network performance of problematic cells.
Materials and Methods
(i)Data Imbalance. In cellular networks, due to the occurrence of rare events (e.g.,
network failure), the collected data sets are usually imbalanced.
These imbalanced data can significantly impact the performance of classifiers, which
is likely to have a skew towards the majority class. However, existing schemes rarely
take the issue of data imbalance into account
(ii)Data Insufficiency. The insufficiency of high-quality data can result in severe
overfitting of learning models (e.g., classifier). Firstly, the data set obtained from high-
fidelity network simulators may not fully represent the measurements in practical
cellular networks. While the real data from network operators (e.g., log data) may not
be well organized and labeled, it is difficult to extract effective information and build
knowledge from these data
(iii)Cost Insensitivity. Most of the existing schemes pursue a low detection error
rate, while ignoring the fact that different types of misclassification errors can cause
different losses to the operators. In such case, considering accuracy as the only
evaluation criterion is defective and cost sensitivity should be considered
(iv)Non-Real-Time Response. Most of the existing self-healing schemes do not meet
the real-time response requirements due to their reactive characteristics. Specifically,
they are mainly based on postoperations (e.g., diagnosing after malfunctions occur).
Designing proactive schemes to reduce the delay and enable real-time response is
challenging
Data imbalance often occurs in machine learning and data mining, when at least
one class contains more samples compared to other classes.
For convenience, we term the class containing numerous samples as the majority
class, and the class including a relatively small number of samples as the minority
class.
The ratio of the number of samples between the minority and majority classes is
used to measure the degree of data imbalance. In general, when this ratio is close
to 1, the data imbalance can be negligible. On the other hand, when the ratio is
significantly less than 1, the imbalance may hamper the performance of classifiers
significantly.
In self-healing, fault detection and diagnosis can be considered typical classification
problems. Accordingly, existing machine learning-based classification methods can
be applied for which measurement data during networking operation period are
collected to train corresponding classifiers.
Data Insufficiency
Data insufficiency arises mainly due to the following reasons. First, for most
researchers in universities and research institutions, acquiring sufficient data from
network operators is not an easy task due to privacy and business issues.
In the existing literature on self-healing, most of the works use data from some
high-fidelity network simulators (e.g., NS3, Vienna-LTE, and LTE-Sim).
Though these simulators provide a good simulation environment, the data collected
via simulations cannot fully represent real network scenarios.
Also, mobile network measurement data may be collected by means of third-party
sniffers or some applications in mobile devices, some measurement data (e.g., fault
indicating data) are difficult to collect.
Second, network operators have a huge amount of operation data which are stored
in system logs. However, these data may not be well organized and labeled.
Cost Insensitivity
1) Data Preprocessing. The issue of data insufficiency can still be addressed through
generating more data artificially. In this context, some methods used to tackle the
problems of data imbalance such as random oversampling and SMOTE and its variants
are suitable to cope with the data insufficiency problem.
(2) Algorithm Modification. In the algorithm level, the common solutions are to
combine data preprocessing with existing machine learning algorithms. Besides, the
concept of transfer learning, which is based on the idea of acquiring knowledge from
one problem/field (source domain) and adopting them to the learning tasks for a new
problem/area (target domain), can be a promising solution approach to overcome the
problem of data insufficiency [5]. In self-healing, some learning tasks are similar to the
ones in other networks. For example, there could be enough data available from
industrial or wireless sensor networks related to the tasks such as error recovery and
intrusion detection.
Result
We use the simulation scenario proposed in [11]. We only consider the binary
classification problem in this article. An imbalanced data set is utilized, and there are
117 fault data and 3,363 fault-free data in classes 0 and 1, respectively. We split the
entire data into a training set (including 2,783 data) and a testing set (including 696
data), and each data is composed of seven key performance indicators (KPIs):
retainability, handover success rate, reference signal received power (RSRP),
reference signal received quality (RSRQ), Signal-to-interference-plus-noise ratio
(SINR), throughput, and distance. For performance evaluation, we show the results
through ROC curves and use the area under the ROC curve (AUC) to compare
different classification algorithms. The larger the AUC, the better the classification
performance.