Combination of Multiple Diagnosis Systems in Sel - 2016 - Expert Systems With Ap

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Expert Systems With Applications 64 (2016) 56–68

Contents lists available at ScienceDirect

Expert Systems With Applications


journal homepage: www.elsevier.com/locate/eswa

Combination of multiple diagnosis systems in Self-Healing networks


David Palacios, Emil J. Khatib, Raquel Barco∗
Communications Engineering Dept., University of Málaga, 29071, Málaga, Spain

a r t i c l e i n f o a b s t r a c t

Article history: The Self-Organizing Networks (SON) paradigm proposes a set of functions to automate network manage-
Received 8 January 2016 ment in mobile communication networks. Within SON, the purpose of Self-Healing is to detect cells with
Revised 20 July 2016
service degradation, diagnose the fault cause that affects them, rapidly compensate the problem with the
Accepted 21 July 2016
support of neighboring cells and repair the network by performing some recovery actions.
Available online 22 July 2016
The diagnosis phase can be designed as a classifier. In this context, hybrid ensembles of classifiers en-
Keywords: hance the diagnosis performance of expert systems of different kinds by combining their outputs. In this
LTE paper, a novel scheme of hybrid ensemble of classifiers is proposed as a two-step procedure: a modeling
Self-healing stage of the baseline classifiers and an application stage, when the combination of partial diagnoses is ac-
Root cause analysis tually performed. The use of statistical models of the baseline classifiers allows an immediate ensemble
Self-organizing networks (SON) diagnosis without running and querying them individually, thus resulting in a very low computational
Hybrid ensemble classifier
cost in the execution stage.
Automatic fault identification
Results show that the performance of the proposed method compared to its standalone components
is significantly better in terms of diagnosis error rate, using both simulated data and cases from a live
LTE network. Furthermore, this method relies on concepts which are not linked to a particular mobile
communication technology, allowing it to be applied either on well established cellular networks, like
UMTS, or on recent and forthcoming technologies, like LTE-A and 5G.
© 2016 The Authors. Published by Elsevier Ltd.
This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction SON comprises three groups of functions: Self-Configuration,


Self-Optimization and Self-Healing. The aim of the latter is to au-
The growing demand for mobile services with ever-increasing tonomously solve the problems that a cell, with service degrada-
bandwidth and the expanding number of users make necessary tion or outage, could present (3GPP (e); Barco, Lázaro, and Muñoz
the deployment of new and more efficient mobile communication (2012)). This is done by means of four stages:
networks over the existing ones (GSM, UMTS), such as Long-Term
Evolution (LTE). However, the complexity of this heterogeneous
scenario, which comprises several Radio Access Technologies • Fault Detection: Responsible for finding cells with problems, i.e.,
(RAT), requires challenging maintenance and complex operational cells experiencing service outage or just suffering an unaccept-
tasks. Mobile operators need to offer new demanding services able service degradation.
without increasing either operational expenditures (OPEX) or • Diagnosis of the fault cause: In this step, the actions to be per-
capital expenditures (CAPEX). In order to deal with that problem, formed in order to recover the system from the degradation it
the 3rd Generation Partnership Project (3GPP) has proposed Self- is suffering are decided. This step can be divided into two sub-
Organizing Networks (SON) (3GPP (d)) as networks that include stages: Fault Identification, this is, identifying the fault cause
mechanisms to automate network procedures in order to help mo- based on observable symptoms such as Key Performance Indi-
bile operators with their management work, providing significant cators (KPI) and alarms; and Action Identification, which corre-
cost reduction. This automation of network management will also sponds to the decision of what tasks to perform to recover the
be essential in near and future technologies, like LTE-Advanced system normal performance.
and 5G (3GPP (b)). • Fault recovery: In this step, the proposed solutions are carried
out.
• Fault compensation: Since diagnosing the fault and repairing it

Corresponding author. Fax: +34952132027. normally takes some time, compensation aims to diminish the
E-mail addresses: [email protected] (D. Palacios), [email protected] (E.J. Khatib), impact of the fault by changing parameters in neighboring cells.
[email protected] (R. Barco).
http://dx.doi.org/10.1016/j.eswa.2016.07.030
0957-4174/© 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 57

This paper is focused on the diagnosis task, in particular in the


fault identification, also called root cause analysis. Once a problem
has been detected in a cell, root cause analysis identifies the fault
cause given the value of performance indicators, alarms, counters,
mobile traces, etc. In the context of cellular networks, some diag-
nosis systems have been recently proposed. Barco, Díez, Wille, and
Lázaro (2009) and Barco, Lázaro, Wille, Díez, and Patel (2009) pro-
posed diagnosis systems based on Bayesian Networks. Szilágyi and
Nováczki (2012) used a scoring system in order to determine how
well a specific case fits a diagnosis. Nováczki (2013) enhanced the
previous system by adding profiling techniques. The method in
Khatib, Barco, Gómez-Andrades, and Serrano (2015) was based on
fuzzy logic and genetic algorithms. Gómez-Andrades, Muñoz, Ser-
rano, and Barco (2016) proposed a diagnosis system based on Self-
Organized Maps (SOM).
Each of the previous methods has its pros and its cons. In prac-
tice, this makes the selection of the diagnosis technique cumber-
some when the aim is to deploy a automatic diagnosis system in
a real network. Furthermore, once the technique has been decided,
e.g., fuzzy logic, operators normally design several standalone di-
Fig. 1. Scheme of an automatic diagnosis system.
agnosis models. This is due to the fact that, firstly, different trou-
bleshooting experts will build different models and secondly, when
models are learnt from historical cases, different training datasets edge in order to get an enhanced performance compared to
will result in different models. that of the base classifiers. In the context of troubleshooting in
To cope with the limitations of standard classifying systems cellular networks this comprises the combination of several di-
in terms of accuracy and dataset-dependent performance, ensem- agnosis models and techniques for the automatic diagnosis.
bles of classifiers arose. Within these, homogeneous and heteroge- • A method to lighten the computational cost of the evaluation
neous (commonly known as hybrid) ensembles of classifiers may stage in hybrid ensembles of classifiers. This work proposes a
be found, where the former stand for the ensemble of classifiers of scheme to model and emulate the behavior of every standalone
the same kind and the latter stand for the combination of differ- classifier so these need not to be continuously queried before
ent kinds of systems and datasets. Despite homogeneous ensem- combining their partial diagnoses.
bles have been widely studied and as of today still are extensively
used in different fields (Begum, Chakraborty, & Sarkar, 2015; Liu, This paper is organized as follows. Section 2 presents the prob-
Chen, Song, & Han, 20 09; Shen & Chou, 20 06; Wiezbicki & Ribeiro, lem formulation. Section 3 introduces the proposed method for
2016). In this paper, a method for the generalized combination of combining multiple baseline diagnosis systems. In Section 4 results
multiple diagnosis systems based on a hybrid ensemble approach are analyzed by means of both a network simulator and data from
is proposed and tested in the context of cellular networks, which a live LTE network. In Section 5 the future lines of work are out-
to the authors’ knowledge is a research area still to be explored. lined. Finally, Section 6 summarizes the main conclusions.
The proposed work describes a method to gather, combine and use
the knowledge held by any kind of expert system in any field that 2. Problem formulation
makes use of a classifying or diagnosis system. In this work, the
proposed method is applied in the fault cause diagnosis in cellular 2.1. Root cause analysis in mobile communications networks
networks, where the expertise may be provided either by a human
troubleshooting expert or by a database of cases assessed by auto- In the same way that a patient is diagnosed by a doctor based
matic diagnosis systems. The proposed method allows combining on the symptoms he shows, the status of a communications net-
diagnosis systems in a wide sense, being able to merge both sev- work may be diagnosed based on a set of performance indicators.
eral diagnosis models (expertise) and the tools used for their appli- This diagnosis task, also called root cause analysis or troubleshoot-
cation (automatic diagnosis techniques) in the form of supervised ing, is often carried out by human experts using their knowledge
or unsupervised classifying systems. on the underlying relations that the observed indicators and the
Up to now, hybrid ensembles of classifiers are mainly based on status of the network have. However, the number of symptoms
a set of baseline systems which must first assess the cases un- (counters, alarms, KPIs, call traces, etc.) and possible fault causes
der test and, consequently, provide partial diagnoses which are fi- the expert has to deal with increases as networks grow in size and
nally combined into a final decision using a majority vote scheme complexity, which makes this task to become a very difficult and
(Ciocarlie, Lindqvist, Nováczki, & Sanneck, 2013; Gandhi & Pandey, time consuming issue.
2015; Wei et al., 2014). This procedure requires a relatively high Furthermore, the current manual troubleshooting is a layered
number of diagnosis techniques to be run in the test stage and, task, guided by a Trouble Ticket (TT) system. In this problem solv-
therefore, a noticeable expenditure of computational and time re- ing system, a group of specialists tries first to diagnose and solve
sources. The proposed work, however, presents a method which the problem by performing some simple checks. If they can not
allows combining the diagnoses that the standalone diagnosis sys- find the root of the problem, this is raised to a more specialized
tems would output for a case under test without actually needing team (and so on), which performs a deeper study on the symp-
them to be run, thus lightening the computational weight of the toms the case exhibits and resorts to field engineers in case they
test stage. need to make some on site checks.
The main contributions of this paper are: As a response to this more and more inefficient procedure,
automatic diagnosis systems arose in an attempt of imitating the
• A method to combine any number and kind of different stan- way of acting of troubleshooters. Fig. 1 shows the basic scheme
dalone classifiers as well as different sources of expert knowl- of a system for automatic diagnosis. It is composed of an au-
58 D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68

tomatic diagnosis technique and a diagnosis model. The first is the causes are seen as the network states, among which the nor-
an artificial intelligence system that outputs a diagnosis taking mal and several fault states may be distinguished. In this paper,
a set of symptoms, e.g., (KPIs) from a test case as its input. The some results from theory of classifiers is used, extended and ap-
second represents the knowledge a human expert would have plied in this context in an attempt of combining the knowledge
on the underlying relations between the symptoms and the fault acquired by these R diagnosis systems, developing a more reli-
causes and may take different forms depending on the diagnosis able and accurate root cause analysis system for communication
technique it is destined to work with. For example, a diagnosis networks.
model may consist of the parameters (e.g., prior probabilities
and probability density functions) required by a given diagnosis 2.3. State-of-the-art in ensemble-based classification algorithms
technique (e.g., bayesian classifier) or a set of rules for other
techniques (e.g., Case Base Reasoning, CBR). As it can be seen in This section aims to provide a brief survey on the most recently
this figure, the diagnosis model may be built from a set of training proposed ensemble-based systems, most of which have been used
cases by means of a machine learning algorithm or by trou- in classifying tasks in areas not related to mobile communications.
bleshooting experts by gathering their knowledge. The proposed Ensembles of classifiers may nowadays be classified into ho-
method aims to combine the knowledge acquired by any number mogeneous and heterogeneous or hybrid. The first stand for those
and kind of diagnosis models and automatic diagnosis tech- ensembles which put together instances of classifiers of the same
niques in an attempt to reduce the errors in fault detection and kind, e.g., several k-Nearest Neighbor (kNN) classifiers. Conversely,
diagnosis. in heterogeneous ensembles a set of classifiers of different kind are
put together, e.g., a kNN and a NN (Neural Network). This is the
2.2. Automated diagnosis from the classification theory scope of the present work, as the latter also allow the combination
of different sources of expert knowledge within a single enhanced
A diagnosis system is a method that given a set of indicators diagnosis system.
or symptoms (called case hereafter) intends to infer the cause that One of the earliest works on ensemble methods proposed to
provoked them. In this sense, a diagnosis system acts as a classi- partition the feature space (i.e., the vector space in which the fea-
fying system in which the attributes from the cases to be classi- tures of the cases to be diagnosed are defined) and to assign each
fied correspond to the symptoms from the case to be diagnosed, part to a different classifier which is supposed to be the best for
and the classes to be assigned correspond to the causes to be in- this subset of cases (Dasarathy & Sheela, 1979). This idea has been
ferred. This is an issue long time investigated in data mining the- widely explored and has given birth to the so-called mixture of
ory (Wu et al., 2008), and many types of classifiers have been de- experts algorithm (Jacobs, Jordan, Nowlan, & Hinton, 1991; Yuk-
veloped over the years in an attempt to get the maximum infor- sel, Wilson, & Gader, 2012), being the paradigm for the classifier-
mation the cases under diagnosis could provide. However, no algo- selection type of ensemble methods. Under this approach, only one
rithm has proven to be clearly better than the rest for all kinds of classifier is working at the same time and its selection is deter-
input data by now. One reason for the increasing efforts in the re- mined by the partition the case under test belongs.
lated research is that the performance of a classifier normally de- Conversely, in classifier-fusion methods all classifiers are usu-
pends on the nature and distribution of the data it has to work ally trained over the entire feature space. The classifier combina-
with. For this reason, the present paper focuses not only on com- tion process involves merging the individual classifiers to obtain a
bining different diagnosis models but on offering the possibility system that outperforms the standalone classifiers. This is the ba-
to combine multiple classifiers in the form of automatic diagnosis sis for the widely used bagging and boosting predictors (Breiman,
techniques. 1996; Freund & Schapire, 1997), being AdaBoost an example of the
Let us assume we have a set of M fault causes to diagnose and latter and one of the most known and used algorithms for classi-
R diagnosis systems (either diagnosis model or technique) to com- fying nowadays. Classifier fusion methods can also be divided into
bine, and that each of these systems can have a subset of these those which work with classification labels only and those which
causes as their output, namely, Wr for the system r. In this sce- make use of a continuous valued output for each classifier for ev-
nario, the set of causes a diagnosis system can identify may be dif- ery class. In this case, the outputs can be seen as the support an
ferent from one system to another. This can be seen in (1), where expert gives to a class in terms of the class-conditional posterior
each row stands for a Wr and the element wrm stands for the mth probabilities (Kuncheva, 2002).
fault cause, diagnosed by the rth system. According to this, each Some examples of ensemble methods as enhanced systems for
row may be different from another. fault disclosure can be found in the literature with many different
⎡ ⎤ purposes. In Liu et al. (2009), an homogeneous ensemble of neural
w11 ... w1m ··· w1M networks with cross-validation for fault diagnosis of analog circuits
⎢ . .. .. .. .. ⎥ with tolerance is proposed. In Shen and Chou (2006), several kNN
⎢ .. . . ⎥
⎢ . . ⎥ classifiers are put together on a majority-vote ensemble to clas-
⎢ r r ⎥
⎢w1 ... wrm ··· wM ⎥ (1) sify the patterns that several proteins may exhibit when folded. In
⎢ ⎥ Begum et al. (2015), an homogeneous ensemble of SVM (Support
⎢ .. .. .. .. .. ⎥
⎣ . . . . . ⎦ Vector Machine) is proposed to identify different types of cancer
from a genetic analysis. Wiezbicki and Ribeiro (2016) proposes an
wR1 ... wRm ··· wRM
homogeneous ensemble of neural networks, combined by means of
In a diagnosis system, a case, x, is characterized by its symp- a weighted majority vote in a sensor network for the classification
toms, xn , where x = {x1 , x2 , . . . , xN }, having a total of N pos- of gases.
sible symptoms. However, each diagnosis system may consider Regarding the most recent works on hybrid ensembles of clas-
only a subset of these symptoms, namely, Nr for the diagnosis sifiers, in Wei et al. (2014) n ensembles are made up by combin-
system r. ing 3n baseline classifiers. Each ensemble comprises three super-
In the context of diagnosis systems for mobile communica- vised methods: a decision tree, a support vector machine and a
tion networks a case corresponds to an observation or measure- kNN algorithm. In each ensemble, the diagnoses from these base-
ment from the network; a symptom may be an event counter, line classifiers are fused applying a weighted majority vote, where
a Key Performance Indicator (KPI), a call trace or an alarm and each vote is weighted by the performance each individual classifier
D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 59

Fig. 2. Proposed method for combining diagnosis systems. Stage 1: Construction of the behavior models.

shows during a prior training stage. Then, the n resulting diagnoses To the authors’ knowledge, no ensemble method for fault cause
are combined into a final diagnosis applying a non-weighted ma- diagnosis in cellular networks has been proposed as of today.
jority vote. In this case, all the baseline classifiers must be super-
vised diagnosis systems, as their performance must be previously 3. Method for combining multiple automatic diagnosis systems
known in order to weigh their votes in the first stage. Unlike this,
the proposed method allows the user to combine any kind of di- In this section, a method for combining the knowledge ac-
agnosis system, either supervised or unsupervised ones. And even quired by any number and kind of standalone automatic diagnosis
more important, regarding the operation stage, in Wei et al. (2014), systems by means of a classifier-fusion scheme is proposed. The
whenever a new case is to be diagnosed it must pass through two proposed method consists of two stages: the construction of the
steps, one of them made up of 3n systems which must first each behavior models of the automatic diagnosis systems, Section 3.1,
output a diagnosis, resulting in a high computational cost. The and the combination of these models in order to make a more
method in the proposed work, however, needs the test cases to accurate diagnosis on the cases from a testing set, Section 3.2.
be assessed only by one step, which, furthermore, only consist of This can be seen in Figs. 2 and 5. Before this method can be
some algebraic calculations. Once the training stage has been per- applied, two sets of N-dimensional cases must be distinguished:
formed, new cases will be diagnosed at a minimum computational the modeling set and the testing set, where each of these N
cost. dimensions stands for a working KPI. The modeling set will be
As for Gandhi and Pandey (2015), a two-step method is again used in the first stage and the testing set in the second.
proposed. The first step consists of a learning stage for the base
classifiers and the second step consists of a majority vote-based
3.1. Construction of the behavior models
combining stage. Again and similar to Wei et al. (2014), every
baseline classifier is required to first diagnose every new case in
The baseline diagnosis systems are to be combined by means of
the application step, which results in a high computational cost
mixing their models of behavior, which need to be extracted first.
compared to that from the application (test) step in the proposed
Once the diagnosis model from each diagnosis system has been
method.
built (either from training cases via a machine learning method
In the context of cellular networks, Ciocarlie et al. (2013) pro-
(Khatib, Barco, Gómez-Andrades, Muñoz, & Serrano, 2015), or from
poses a hybrid ensemble of classifiers to detect anomalies in the
the experts’ knowledge (Gómez-Andrades et al., 2016) each diag-
performance indicators of a cell. This work is focused on the fault
nosis system can start the classification (Fig. 1). In this stage, every
detection. Unlike this, the proposed work does not just find a per-
case from the modeling set is diagnosed by the R systems. That
formance degradation, but identifies the fault cause behind it. Re-
is, each system assigns to each case one of the M possible fault
garding its implementation, this method relies on the use of a
causes; in particular, one of the causes that system can discern.
pool of models. New models are added to this pool whenever a
This can be seen in Fig. 2, where the case x acts as the input
change in the configuration
 parameters of the network takes place.
U +N G
for the R systems and, in turn, they assign it R diagnosis labels.
A number of NCM × Nunivariate × NKPI multivariate × NKPI models, If the system r diagnoses the case x with the cause m, this case
and thus, instances of automatic techniques must be assessed for
receives the label wr∗
m . In this way, each diagnosis system makes a
every single new case under test. In this expression, NCM stands for
different partition of the modeling set into |Wr∗ | disjoint subsets,
the number of sets of network configuration parameters consid-
U
whose maximum is |Wr |, that is, the number of causes that system
ered; NKPI and Nunivariate stand for the number of univariate tech-
considers (Fig. 3), where |A| is the number of elements in the set
niques considered and the number of KPIs acting as their input in
G stand for the number of mul-
A. This leads to finally identify M∗ different causes, being M∗ the
each model; and Nmultivariate and NKPI
union of Wr∗ over r, with M∗ ≤ M. According to this, a new matrix
tivariate techniques used and the number of groups of KPIs con-
from (1) may be written, substituting every row (i.e., every Wr ) by
sidered in each model. Like in Ciocarlie et al. (2013), Wei et al.
its corresponding Wr∗ . Each row would represent one of the parti-
(2014) and Gandhi and Pandey (2015), before an ensemble decision
tions of the modeling set and each column would represent how
can be made, a high number of baseline classifiers must be first
a cause “is seen” by each diagnosis system regarding the KPIs the
queried. And again similarly to Ciocarlie et al. (2013), according to
cases belonging to that wrm exhibit.
Wei et al. (2014) and Gandhi and Pandey (2015), all the partial de-
It should be noticed that each of these M∗ subsets contains a
cisions meet at a combining stage based in a weighted majority
number of |Nr |-dimensional cases. At this point, the behavior of
vote.
the diagnosis system r is modeled through the estimation of the
60 D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68

KPIs, usually limited between zero and one (Barco, Lazaro, Diez,
& Wille, 2008). KPIs like the retainability or the accessibility of-
ten reach these extreme values making the resulting fitted beta
present asymptotes in these values. To avoid this issue the used
beta function β  is slightly different from that from Table 1, β . In
this case,

β  (x ) = (1 − P0 − P1 )β (x ) + P0 /hβ δ (x ) + P1 /hβ δ (x − 1 ), (2)

Fig. 3. Modeling set divided into different subsets by means of two different par- where β (x) stands for the distribution fitted to a set with no ex-
titions: on the left, the partition the first diagnosis system makes, having W 1 = treme values; P0 and P1 stand for the relative frequency of cases
{w11 , w12 , w13 } with |W 1 | = |W 1∗ | = 3; on the right, the partition the diagnosis sys- with value 0 and 1 respectively; δ stands for the Dirac’s delta and
tem R makes, having W R = {wR1 , wR2 , wR3 } and W R∗ = {wR1∗ , wR2∗ }. In this last case, the
hβ stands for the step (the resolution) when computing β  . This
diagnosis system R only diagnosed the causes 1 and 2 although being able of also
identifying the fault cause 3. can be seen in Fig. 4b, where a normalized histogram for the KPI
retainability is shown.
Table 1
Families of PDFs considered for the estimation of p(xn |wr∗
m ). 3.2. Combination of behavior models
Distribution PDF Parameters
This stage uses the cases from the testing set. In the previous
(a+b) a−1
Beta (a )(b) x (1 − x )b−1 a, b
stage, the estimated functions have been seen as conditional prob-
 x −μ 2
Normal √1 exp − 12 σ μ, σ ability density functions, that is, functions that express how the
σ 2π
 2
KPIs are distributed over the cases diagnosed with a given cause
1 ln(x−μ )
Log-normal xσ
1


exp − 2 σ μ, σ by a given system. However, this set of functions may be seen as
Exponential λ exp(−λx ) λ likelihood functions by just changing the approach. From this point
ξ +1 exp (−t (x )), μ, σ , ξ of view, the function depends on wr∗ m given that an observation of
σ t (x )
1
Gen. extreme value
  x −μ − ξ1
the random variable xn (that is, the nth KPI) has taken place.
t (x ) = 1+ σ ξ ξ = 0 Now, assuming the KPIs are independent among each other, a
exp(−(x − μ )/σ ) ξ = 0
m , that is, p(x|wm ), may be written
joint probability function of wr∗ r∗

( ν +1 )  x −μ 2
− ν +1
2
as
T-location √2
( ν2 ) π νσ
1 + ν1 σ ν , μ, σ

Nakagami 2mm
(m )m x2m−1 exp −  x m 2
m,  p(x|wr∗
m) = p(xn |wr∗
m ). (3)
 n∈N r
Gamma 1
(k )θ k
xk−1 exp − θx k, θ
exp ( x−sμ ) Given (3), and assuming that the prior probability of each cause,
Logistic μ, s |wr∗
( ( )) m|
s 1+exp − x−sμ
P (wr∗m ) is given by |W r∗ | , the a posteriori probability for a diagnosis
2

(β /α )(x/α )β −1
Log-logistic α, β system r to diagnose a case with the cause m given its KPIs are x
(1+(x/αβ ) )
2

 k−1  k (i.e., P (wrm |x )) can be calculated by just applying the Bayes’ theo-
Weibull k x
λ λ exp − λx λ, k

rem. That is,
Rayleigh x
σ 2 exp − 2
1 x 2
σ ⎧
⎨  p(x|wr∗ m )P ( wm )
σ r∗
  xν P (wr∗
x +ν
ν, σ if m) > 0
wr∗ ∈W r∗ p(x|wi )P (wi )
2 2

P ( wm |x ) =
x
σ 2 exp − 2σ 2
Rice I0 σ 2 r r∗ r∗
(4)
⎩ i
0 if P (wr∗
m) = 0

statistical distributions of the Nr KPIs for the cases belonging to At this point, some diagnosis system may have not diagnosed
Wr∗ . That is, the behavior of each diagnosis system is modeled by a given cause as seen in Fig. 3. In such case, P (wr∗ m ) and thus
means of Nr × M∗ PDFs. The estimated statistical distribution of (4) would result equal to zero. In any case, M × R a posteriori
the nth KPI for the subset of cases diagnosed as m by the diag- probabilities may be distinguished. Fig. 5 shows this when a case y
nosis system r is p(xn |wr∗ m ). The choice of the PDF that estimates
from the testing subset is to be diagnosed. As it can be seen in this
each one of these distributions is done according to the maximum figure, the KPIs from the case y act as input values in the behav-
likelihood (ML) criterion. To do so, some families of PDFs are con- ior models of the R diagnosis systems, i.e., the probability functions
sidered in the fitting procedure (Table 1). In a first step, the distri- p(y|wr∗m ) for wm ∈ W
r∗ r∗ and r = 1, . . . , R. Then, the a posteriori prob-

bution of the KPI xn from the cases labeled as wr∗ m is fitted attend-
abilities P (wrm |y ) are computed using these together with P (wr∗ m)
ing to the ML criterion with each one of the considered families by means of the Bayes’ theorem.
of PDFs. This results in a set of candidates for estimating its dis- Now, these M × R a posteriori probabilities together with the
tribution. These PDFs are then sorted by their likelihood and the prior probabilities can be combined over R using some algebraic
one with the maximum value is chosen to be the estimation for functions, producing M probabilities of the kind P (wm |y )Rulet per
the KPI. function used, where m again stands for the cause and t is an index
The reason for considering these families of PDFs is to get the for the rule used in the combination, that is,
better estimation of the distribution of the KPI xn given its belong- 
P (wm |y )Rulet = fRulet P (w1m |y ), . . . , P (wRm |y ); P (wm ) . (5)
ing to wr∗m . Fig. 4a shows a normalized histogram of the KPI “95th
percentile RSRP” from the cases labeled as wr∗ m . In this figure, two where P(wm ) is defined as the average of P (wr∗
m ) over r.
families of PDFs have been used in an attempt of fitting the under- Some rules for the combination of a posteriori probabilities
lying histogram, the normal and the generalized extreme value. As given by several classifying systems are proposed in Kittler, Hatef,
it can be seen, the latter fits it better, resulting in a higher value Duin, and Matas (1998) and studied further in Kuncheva (2002).
in a likelihood-ratio test. In the first, those rules are derived from a maximum a posteriori
While some KPIs are counters and they do not have an up- (MAP) estimation in a multiple random variable scenario in an at-
per limit, there are others that are inherently bounded, as they tempt of lightening the efforts of computing several joint probabil-
are defined as a ratio. Normally, the beta PDF is used to fit these ity density functions. These rules are summarized in Table 2.
D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 61

Fig. 4. (a) Normalized histogram for the KPI 95th percentile RSRP and two fitted PDFs: a generalized extreme value PDF in blue (round markers) and a normal PDF in red
(square markers). (b), Normalized histogram for the KPI Retainability and a β  PDF estimation. (For interpretation of the references to colour in this figure legend, the reader
is referred to the web version of this article.)

Fig. 5. Proposed method for combining diagnosis systems. Stage 2: Combining the behavior models.

As this point, the fault cause with the maximum a posteriori 4. Proof of concept
probability is taken as the final diagnosis per each rule of combi-
nation, dt . That is, In this section, the proposed method is assessed by combin-
ing two different diagnosis models. In the first test, each model
is provided by a different expert; in the second test, each model
dRulet = arg max{P (wm |y )Rulet }. (6) comes from using different machine learning algorithms for build-
m
ing the diagnosis models, provided the same set of training
cases.
Note that a situation with M∗ < M means that there is at least The proposed method has been evaluated and compared to the
one fault cause that have not been identified by any system. In baseline systems by means of the following figures of merit:
this case, it would be impossible for it to be finally diagnosed in
consequence.
62 D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68

Table 2 of 57 macro-cells evenly distributed in space and grouped into 19


Algebraic rules for the combination of a posteriori probabilities.
three-sector-sites. To perform this test, similar network configura-
Rule P ( wm |y ) tion parameters to those used in Gómez-Andrades et al. (2015) and
Gómez-Andrades et al. (2016) have been used. They can be seen in

R
Product rule P (wm )−(R−1) P (wrm |y ) Table 3.
r=1
With this simulator, 1196 cases have been obtained. In this case,

R
Sum rule ( 1 − R )P ( wm ) + P (wrm |y ) training cases are not needed since the diagnosis models have
r=1
R
been defined by experts. It is assumed that a detection system is
Max rule (1 − R )P (wm ) + R max{P (wrm |y )} placed before the input of the diagnosis system, so that only the
r=1
R faulty cases are put under test, putting aside the cases belonging
Min rule P (wm )−(R−1) min{P (wrm |y )}
r=1 to a normal cause of functioning. Therefore, in this test only the
R
Median rule med{P (wrm |y )}
DER is taken into account.
r=1
In this scenario, six typical RAN fault causes have been consid-
ered (M = 6):
• Diagnosis Error Rate (DER): it is the ratio of problematic cases
diagnosed as a fault cause different to the real one (misclassi- • Excessive downtilt: This situation takes place when the coverage
fied cases), NMPC , to the total number of problematic cases, NPC . area for a cell is too small, making the signal level in the edge
• False Positive Rate (FPR): it is the number of normal cases di- of the cell to be too weak and causing a high number of han-
agnosed as problematic cases, (NFP ), to the total number of nor- dover failures. The quality of the signal in the surroundings of
mal cases, (NNC ). the cell is also decreased.
• False Negative Rate (FNR): it is the number of problematic cases • Coverage hole: A cell has a coverage hole in some point inside
diagnosed as normal cases, NFN , to the total number of prob- its area when the power received by the user at this point from
lematic cases, NPC . This is the most critical metric, as it gives any cell is not enough to hold the service. This excessive atten-
an idea on how often the diagnosis system interprets there is uation can be caused by either obstacles or a bad RF planning
no problem when actually some cells are suffering from mal- and it mainly produces a high number of call drops.
functioning. • Inter-system interference: This fault cause may occur due to
other cellular networks, like WCDMA. It is not always an easy
Given these definitions, an Overall Error Rate (OER) may be de- issue to solve, since the fault usually comes from an outer sys-
fined as tem. This fault normally causes both the SINR and the average
OER = PN · F P R + PPR · (F NR + DER ) (7) throughput decrease.
• Too late handover: A too late handover takes place if a radio link
where PN stands for the relative frequency of the normal cases and failure occurs while the UE (User Equipment) is moving from
PPC stands for the relative frequency of the faulty cases. This metric one cell to another and the corresponding handover between
is useful to assess every method at a single glance. Since these fig- these cells has not taken place yet. In that case, the UE will
ures of merit require the true cause to be known, the used testing request the second cell a connection re-establishment using the
set will include the real diagnosis. physical cell ID of the first cell and its Common Radio Network
Temporary Identifier (C-RNTI) in that first cell, which will alert
4.1. Combination of diagnosis models devised by multiple experts the second cell a too late handover has occurred.
• Excessive uptilt: A cell suffers from excessive uptilt when its cov-
4.1.1. Scenario erage area is larger than necessary, normally because of a bad
In this test, cases are provided by an LTE RAN simulator (Muñoz configuration of the radiation parameters of the antennas. This
et al., 2011). This simulator considers an LTE network composed situation can result in the overlapping of coverage areas from

Table 3
Simulation parameters for cells normal functioning.

Parameter Configuration

Cellular layout Hexagonal grid, 57 cells, cell radius 0.5 km


Transmission direction Downlink
Carrier frequency 2.0 GHz
System bandwidth 1.4 MHz, 6 PRB (Physical Resource Block)
Frequency reuse 1
Propagation model Okumura-Hata with wrap-around, Log-normal slow fading,
σs f = 8 dB and correlation distance = 50 m
Channel model Multipath fading, ETU model
Mobility model Random direction, 3 kph
Service model Full Buffer, Poisson traffic arrival
Base station model Tri-sectorized antenna, SISO, PT Xmax = 43 dBm, Downtilt = 9 °
Azimuth beamwidth = 70 °, Elevation beamwidth = 10 °
Scheduler Time domain: Round-Robin, Frequency domain: Best Channel
Power control Equal transmit power per PRB
Link Adaptation Fast, CQI (Channel Quality Indicator) based, perfect estimation
Handover Triggering event = A3, HOM (Handover Margin) = 3 dB,
Measurement type = RSRP
Radio Link Failure SINR < −6.9 dB for 500 ms, Mehlführer, Wrulich, Colom Ikuno, Bosanska, and Rupp (2009)
Traffic distribution Evenly distributed in space
Time resolution 100 TTI (Transmission Time Interval) (100 ms)
Epoch & KPI time 100 s
D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 63

Table 4 Table 5
Parameters used for modeling fault causes in Section 4.1 and a priori Diagnosis models for the diagnosis sys-
probabilities for each cause. tems used in test 1: used thresholds.

Fault cause Configuration P(ω m ) KPI Thresholds

Excessive downtilt Downtilt = [16, 15, 14] ° 0.18 Retainability [0.973, 0.996]
Coverage hole hole = [49, 50, 52, 53] dBm 0.09 HOSR [0.899, 0.989]
Inter-system interf. PT Xmax = 33 dBm 0.1 RSRP [dBm] [−76.9, −72.4]
Downtilt = 15 ° RSRQ [dB] [−18.8, −18.2]
Azimuth beamwidth = [30, 60] ° SINR [dB] [13, 14.5]
Elevation beamwidth = 10 ° Throughput [kbps] [96.2, 111.67]
Too late HO HOM = [6, 7, 8] dBm 0.23 Distance [km] [0.838, 0.88]
Excessive uptilt Downtilt = [0, 1] ° 0.21
Lack of coverage PT Xmax = [7, 8, 9, 10] dBm 0.19

through the following equation, 3GPP (c),


possibly non-adjacent cells, producing a high number of han- Dk
Tk = (1 − BLER(SINRk )) · , (9)
dovers and call drops in this cell and its neighbors TTI
• Lack of coverage: A user suffers from weak coverage when the where BLER is the Block Error Rate obtained from the users’
Signal-to-Interference-Plus-Noise Ratio (SINR) measured in the SINR, Dk is the data block payload in bits of user k and TTI is
cell is below the minimum level needed to maintain a planned the transmission time interval.
performance requirement because the received power is low. In order to show the impact a proper modeling may have in
The simulation parameters used to model these degradations the diagnosis performance of the proposed method the propor-
are shown in Table 4, as well as the a priori probability of these tion of cases used for the modeling to the testing set has been
causes to take place, given by the experts. In this case, P (w1m∗ ) = varied from 25% to 75%. To obtain more reliable results when
P (w2m∗ ) = 0 ∀m, so P (wm ) = P (w1m∗ ) = P (w2m∗ ). As it can be seen, the number of cases are scarce either in the testing or in the
several values have been used for modeling a single fault cause, modeling set, 50 repetitions have been made per modeling-to-
according to lighter and more severe degradation. testing ratio, randomizing the cases assigned to each set. Then,
In this test, seven observable features or KPIs (N = 7) have been the resulting diagnosis error rates have been averaged over the
used to discern among this set of causes: 50 repetitions.

• Retainability, given as a percentage. This performance indicator 4.1.2. The standalone classifiers
quantifies the ability of the cell to hold the service once ac- In this test, for a given technique of automatic diagnosis, two
cepted by the admission control. It gives an idea on how often diagnosis models are combined, R = 2, where each of them is pro-
a user experiences a call drop. vided by a different expert. This test represents the usual case
• Handover success rate (HOSR), given as a percentage. This KPI in cellular networks where each troubleshooting expert defines
measures the ability of the network to provide mobility to a his own set of rules and KPI thresholds to identify problems.
user without losing its connection. It can be calculated as the When deploying the diagnosis system in a network, according to
ratio between the number of successful handovers and the total the proposed method, instead of choosing one single model, the
number of HO. knowledge from both experts is fused by combining two diagno-
• 95th percentile RSRP, given in dBm. The Reference Signal Re- sis models. Furthermore, both diagnosis models comprise the six
ceived Power (RSRP) is defined as the linear average over the fault causes and the seven different KPIs described above. That is,
power contributions (in [W]) of the resource elements that W 1 = W 2 with |W 1 | = M and N 1 = N 2 .
carry cell-specific reference signals within the considered mea- The artificial intelligence technique used for these tests is based
surement frequency bandwidth. on a Fuzzy Logic Controller (FLC) (Khatib, Barco, Gómez-Andrades,
• 5th percentile RSRQ, given in dB. The Reference Signal Received & Serrano, 2015). This system contains rules, which are com-
Quality (RSRQ) is a signal quality indicator and is defined as the posed of the antecedent (the “if ...” part) and the consequent (the
ratio “then ...” part), being the last the cause the fuzzy logic controller
NPRB · RSRP assigns to a case if the antecedent is fulfilled attending to the
RSRQ = , (8)
RSSI fuzzyfied observable features of the case. On the one hand, Table 5
where NPRB is the number of resource blocks of the E-UTRA car- shows the thresholds used in both diagnosis models. The lower
rier RSSI measurement bandwidth and RSSI stands for the to- limit stands for the value below which a KPI is considered to be
tal received power within the measurement bandwidth. This is, low; the upper limits stands for the value above which a KPI is
considering the power from the serving cell, the power of the considered to be high. On the other hand, Table 6 shows the if ...,
co-channel serving and non-serving cells, the adjacent chan- then ... rules that make up each diagnosis model, given by each
nel interference and any possible source of noise. In this paper, expert. From left to right, each column below “KPI” in Table 6 cor-
RSRQ is expressed in dB. responds to the KPIs shown in Table 5. H stands for a high value
• 95th percentile SINR, given in dB. The Signal-to-Interference- in that KPI and L for a low value. Regarding the numbering of the
plus-Noise Ratio (SINR) is defined as the ratio between the diagnoses, 1 means excessive downtilt; 2: coverage hole; 3: inter-
power of the desired data signal and the sum of the powers system interference; 4: too late handover; 5: excessive uptilt and
of all inter-cell interferences and the noise. It is expressed in 6: lack of coverage.
dB.
• 95th percentile distance, given in km. This KPI measures the dis- 4.1.3. Results
tance between users and their serving cell, expressed in km. It Table 7 shows the diagnosis error rates computed when the
can be estimated attending to the transmission delay between Max rule is used for combining (Table 2). In Table 7, the aver-
them and gives an idea of the cell coverage area. age diagnosis error rate and the rate of improvement are shown.
• Average throughput, given in kbps. In LTE systems, the user This last rate represents the amount of repetitions (among the 50
throughput depends on the SINR experienced by the user that have been performed) in which the diagnosis error rate from
64 D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68

Table 6
Diagnosis models for the diagnosis systems used in test 1: used rules.

Diagnosis model 1 Diagnosis model 2

KPI Diag. KPI Diag.

L L H L – H L 1 – – H L – H L 1
H H – L L H L 1 L – H H – L H 2
L – – H H – H 2 L – H H H – H 2
L – – H L L H 3 L – – H L L H 3
L L H – L L H 3 L – H – L L H 3
– – H H H H – 4 L – H H L L – 3
– H H – H H – 4 L H – H L L – 3
H – H – H H – 4 – H H H L L H 3
– – H – H H H 4 L L – – – H H 4
H H – – H – H 4 L L – L – – H 4
H H – L – – H 4 L L H – H L – 4
H H – – – H H 4 L L H – H – H 4
H H H – – – H 4 L L L L L L – 4
– – H L H – H 4 – – L H – L H 5
– – H L – L H 4 H H – H – L H 5
L L H L – – H 4 H H L – L L H 5
– – L – L L H 5 – – – L L H L 6
– – L – L H L 6
L – – L L H L 6
– H – L L H L 6
H H – – L H L 6
L L L L L – L 6

Table 7 Table 8
Results of test 1: Combining two versions of the same classifying algo- Main parameters of the real LTE network used in test two.
rithm.
Parameter Configuration
Modeling-to-testing ratio
Network Layout Urban area
25% 50% 75% Number of cells 8679
System bandwidth 10 MHz
Diagnosis syst. 1, average DER 13.81% 13.7% 13.65%
Number of PRBs 50
Diagnosis syst. 2, average DER 16.34% 16.13% 16.3%
Frequency reuse factor 1
Ens. Method: Max rule average DER 8.29% 5.92% 5.34%
Max. Transmitted Power 46 dBm
Rate of improvement 60% 98% 100%
Max. Transmitted Power of UE 23 dBm
Horizontal HPBW (Half-Power Beam Width) 65°
HOM 3 dB
the ensemble method is lower than the best one provided by the
KPI Time Period Hourly
baseline diagnosis systems. With a 25% of modeling-to-testing ra- Number of observed cells 45
tio only 60% of the iterations shows a better ensemble diagnosis Number of days under observation 6 days per cell (on average)
error rate than the ones from its base diagnosis systems, showing, Size of the dataset 14,692 labeled cases
therefore, little improvement in the average diagnosis error rate.
This result highlights how the scarcity of cases for modeling im-
pacts on the classifying performance of the ensemble. However, if work has been performed. In this test, the diagnosis models built
the number of cases used for modeling is doubled 98% of the iter- from two different machine learning algorithms have been com-
ations shows a better diagnosis error rate, which results also in a bined.
lower average diagnosis error rate. In case the modeling-to-testing
ratio is set to 75% every diagnosis error rate provided by the en-
semble method is lower than the lowest provided by its compo- 4.2.1. Scenario
nents, reaching a 5.34% on average. This means a DER of approxi- An LTE network composed of more than 80 0 0 different cells
mately 1/3 the lowest DER achieved by the standalone classifiers. providing coverage to almost 4 million people has been analyzed.
Regarding the DER of the standalone diagnosis systems, it can Its vastness makes many different cells to coexist and also a wide
be seen how these are held over the modeling-to-testing ratio. variety of problematic causes to come up. Table 8 summarizes the
This is because of the randomizing process executed over the la- main parameters of the network. Among all the available candi-
beled cases to be divided into the modeling and testing subsets. dates, 45 random cells have been chosen to represent the network
When this random permutation is performed a number of times behavior. These cells have been monitored for almost 6 days on
and some subsets (two, in this case) are chosen blindly from this average and their KPIs have been stored in an hourly basis. Tak-
set, the averages of the amount of cases labeled with a given cause ing into account that the state of a single cell varies substantially
in each of these subsets tend to the ratios of the labels from the throughout the day due to the traffic fluctuation, several cases have
original set. This is a consequence of the law of large numbers. For been stored from each cell at different hours, resulting in a total of
this reason, the resulting averaged DER of these baseline systems 14,692 cases. Once these cases were gathered, they were all labeled
is independent on the size of the subsets made from the original by the experts, distinguishing four groups of cases (M = 4): three
set of cases. kinds of problematic patterns and the normal cell functioning. The
causes of malfunctioning that were found are:
4.2. Combination of different diagnosis systems on a live network
• Overload: This fault cause is mainly distinguished by a high
Once the proposed method has been tested with cases provided number of RRC connections in the cell, which makes the CPU
by a simulator, a second test with cases from a real live LTE net- processing load and the number of HO attempts raise conse-
D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 65

Table 9 Table 10
Prior probability of occurrence for the causes considered in test two, Diagnosis models for the diagnosis systems used
P(wm ). in test 2: used thresholds.

P(Overload) P(Lack of cov.) P(Non-operating) P(Normal) KPI Thresholds

0.01 0.22 0.47 0.3 Retainability [0.99, 0.997]


Accessibility [0.992, 0.998]
Number of RRC Connections [5846, 20703]
Number of ping-pong HO [18, 83]
quently. The accessibility and retainability KPIs also hold values Number of bad cov. reports [217, 1070]
quite below the ones for a cell with normal functioning. CPU average load [%] [22.5, 34.45]
• Lack of coverage: This issue can be identified based on the num-
ber of bad coverage evaluation reports, which should be notice-
ably high. individual to survive and reproduce, and selection, a process in
• Non-operating cell: In this case, and only if the cell is report- which some individuals are chosen to survive and reproduce based
ing any KPI measurement, most of the reported measurements on the results from the evaluation stage. Likewise, data driven al-
should be near zero: the retainability, the accessibility, the gorithms first take a case from the training set and derives the
number of performed HO, the number of RRC connections or fuzzy rule that covers it. Then, it looks for the cases covered by
the number of coverage reports. this rule and scores the rule attending to the number of covered
cases. New incoming cases are taken until the training set is com-
The a priori probability of occurrence of each class has been
pletely explored. Provided this set of scored rules, the algorithm
computed as the average of P (wr∗ m ) over r within this selection,
then fuses them into a lower number of rules in a attempt of
Table 9. From this table it should be noted that there are more
maximizing the number of cases (and therefore, the score) cov-
faulty cases than healthy ones. This is because a previous non-
ered by the resulting fused rules. In these tests, it is assumed that
perfect faulty cases detecting stage has been applied, which by-
not only faulty cases, but also some normal cases are inputs for
passed some normal cases that now are to be diagnosed as such.
the diagnosis stage. This can happen when there is no detection
At this point, a 20% of the total number of cases (holding the
system before the diagnosis system or in the realistic situation in
proportion shown in Table 9 between them) were used as a train-
which the detection system has a given probability of error. As
ing set for the machine learning algorithms and the rest were used
in Section 4.1.2, both systems take as possible output all the pre-
to conform the modeling and testing sets in a ratio that, as in
sented diagnoses making use of the six KPIs shown above. Table 10
Section 4.1.3, was varied along the test.
shows the thresholds used for these KPIs to consider them high or
In this test six of the most representative KPIs in an LTE net-
low and Table 11 shows the rules each machine learning algorithm
work have been chosen to discern between the possible diagnoses,
has derived from the testing set. As in Table 6, H stands for a high
N = 6:
value of the KPI and L, for a low value. The KPIs are sorted in the
• Retainability: described in Section 4.1.1. same way as in Table 10 and the numbering of the diagnoses are
• Accessibility: It is used to show the percentage of connections 1: CPU overload; 2: lack of coverage; 3: non-operating cell and 4:
that have got access to that cell over the KPI time period. A normal functioning.
low value in this KPIs means that many connections have been
blocked during the access procedure. 4.2.3. Results
• Number of RRC connections: It is the number of successfully es- Once the standalone diagnosis systems have been trained, the
tablished RRC connections. Related to the Accessibility KPI, it performance metrics DER, FPR, FNR and OER have been computed
gives an idea of the amount of users served by the cell. for both the standalone diagnosis systems and the rules described
• Number of Ping-Pong Handovers: This KPI counts the number of in Table 2. In this test, the modeling-to-testing ratio has been var-
ping-pong HO that takes place in the cell over the measure- ied from 10% to 90% in steps of 10, making 10 iterations per step.
ment time period. A high value in this KPI may mean a bad As in Section 4.1.3, a random permutation of the cases used for
configuration in the handover policy, as the number of connec- modeling and testing has been done in each of these 10 iterations.
tions that goes back and forth over a cell and its neighbors is The resulting metrics have been then averaged. Table 12 shows the
high for a single call. metrics that result of using a proportion of 60% in the modeling-
• Number of bad coverage reports: It counts the number of times to-testing ratio. This ratio has proved to minimize the values of all
a cell is notified that the UE measured a signal level in which the metrics in this test. Unlike in Section 4.1, this scenario is made
the requirements for the Event A2 takes place, 3GPP (a). This from real cases and contains outliers, that is, atypical cases. As the
is, the measured signal level is under a certain threshold. modeling-to-testing ratio rises, the probability for these outliers to
• CPU average load: It is the average CPU load due to the pro- belong to the modeling set also rises, thus inducing the behavior
cesses carried out by the cell over the KPI time period. models to deviate from modeling the trend of the typical cases
given a fault cause. On the other hand, if no outliers are taken into
4.2.2. The standalone classifiers account during the model-fitting procedure their fault cause will
In this test, the two used standalone classifiers share a simi- not be predictable in the second stage and the error rates will also
lar diagnosis system, a fuzzy-logic controller, which diagnoses the rise up.
cases attending to if . . . , then . . . rules. The difference resides in the As it can be seen in Table 12, in most cases, the combined diag-
algorithms they use for learning the rules they apply during the nosis system outperforms the standalone diagnosis systems. Con-
diagnosis process. The first is a genetic algorithm and the second cretely, the median rule achieves the lowest overall error rate with
is a data driven algorithm (Khatib, Barco, Gómez-Andrades, Muñoz, a 5.39%, approximately 2/3 from that of the best standalone diag-
& Serrano, 2015; Khatib, Barco, Gómez-Andrades, & Serrano, 2015) nosis system. However, the most relevant improvement takes place
respectively. In genetic algorithms, three main processes may be in the reduction of the FNR, which has been reduced a 46%. The
distinguished: reproduction, by means of which new individuals FNR gives an idea of the amount of problematic causes wrongly
are created by either mutation or combination of the previously deemed as normal. It is crucial making this metric as low as pos-
existing; evaluation, or the calculation of the probability of each sible, since considering a problematic case as normal may result
66 D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68

Table 11
Diagnosis models for the diagnosis systems used in test 2: used rules.

Diagnosis model 1: Diagnosis model 2:


from genetic algorithm from data driven algorithm

KPI Diagnosis KPI Diagnosis

H H – – L L 1 H H L – L L 1
H – H H L L 1 H H – H L L 1
– H H L – L 2 H – H H L L 1
H – – L – H 2 – L H L – H 2
H – – L H – 2 – H H – H H 2
H – H L – – 2 – – H L H H 2
– L H L – H 2 L – H – H H 2
H H – – H – 2 H H H – H – 2
– H H – H – 2 H L – L L H 2
– – H L H – 2 H – H L – H 2
H – H – H H 2 H – H L H – 2
H – H – H H 2 – L L L L L 3
– L L L L L 3 L L L L H – 4
L L – – H H 4 – L L L H H 4
L L L – H – 4 L L L – H H 4
L L H L L L 4 L – L L H H 4
L L L – – H 4
L – L – H H 4
– – L L H H 4
L L L H – – 4

Table 12 training stage. Going a step further from the idea of the
Results of test 2: Combining two different algorithms.
weighted majority vote used in Wei et al. (2014), a score system
DER FPR FNR OER based on class and classifier aware decision templates applied
over the a posteriori probabilities P (wrm |x ) from Eq. (4) could
Training: Data driven algorithm 2.62% 16.91% 6.47% 11.43%
Training: Genetic algorithm 1.87% 16.61% 2.68% 8.16%
be used to improve the overall accuracy.
• Non-parametric PDFs. The proposal of analytically and
Ensemble method
Product rule 2.6% 12.21% 1.32% 6.2% parameter-defined PDFs results in a really light way of
Sum rule 1.78% 11.55% 1.25% 5.59% representing a statistical behavior, as only its parameters must
Max rule 1.78% 11.51% 1.25% 5.57% be stored Table 1 to model a diagnosis system. However,
Min rule 2.05% 11.42% 1.4% 5.84% these distributions may limit to some extent the statistical
Median rule 1.78% 10.67% 1.34% 5.39%
representation of the features from the training cases and they
Majority vote rule 1.78% 11.23% 1.25% 5.49%
may eventually introduce a source of error in the posterior
computation of P (wrm ) in case these cases follow a distribu-
in the worst case in unnoticed service outages and degradation tion that has not been considered. To solve this, the future
in the network performance. Regarding this, the proposed method research could focus on using non-parametric PDFs, like the
has proved to successfully reduce the FNR. Other indicators are not kernel-based ones.
as critical. For example, misleading a fault cause with another may Probability density functions may be classified into parametric
be to some extent tolerable (DER); although the actual problem is and non-parametric functions. The former have analytic expres-
not that one the operator thinks it is, he is still aware of a problem sions and their shape depends on the parameters those func-
in the network. Even considering normal cases as faulty may be tions hold. The latter, however, are defined by means of a ker-
tolerable as the network performance is not really degraded (FPR). nel function. If all the cases from the dataset are placed along
These results can also be seen in the normalized confusion the axis given by a feature of interest (a certain KPI, for exam-
matrices from the diagnosis methods. Fig. 6a shows the normal- ple) and a kernel function is centered wherever a point is, an
ized confusion matrix for the FLC using genetic algorithm for rule empirical non-parametric PDF would result from averaging the
learning; Fig. 6b shows the confusion matrix given the data driven sum of these functions over the number of cases. The main ad-
algorithm was used for learning the rules and Fig. 6c shows the vantage of this method is its accuracy when modeling an em-
matrix from applying the median rule with a 60% of modeling- pirical distribution. Its main drawback is that, since it is not
to-testing ratio in the ensemble method. In these matrices, the defined by any parameters, it should be computed and stored
elements from the fourth column (excluding the main diagonal) point by point, possibly increasing the storage and comput-
account for the false negatives and the elements from the fourth ing requirements. This method, however, may be used together
row account for the false negatives. It can be seen how the with (c). First, a reduced set of synthetic KPIs is computed and
elements from the main diagonal are reinforced in the ensemble then, their PDFs are accurately estimated with this method.
method and how only those diagnoses which are mistaken by • Use of synthetic KPIs via feature extraction. As it is described in
both baseline systems are slightly inherited by the latter. Fig. 6c Section 3.1, Nr × M∗ × R PDFs should be estimated in order
also shows graphically how the FPR and FNR dropped with respect to model all the feature-class-classifier relations. If any of these
to those from the standalone systems. factors is relatively high, the computing cost for all these PDFs
to be computed could be prohibitive. Due to this, working with
5. Future lines of work a reduced group of synthetic/extracted features is proposed in
an attempt of mapping the N original features into N ˆ synthetic
• Decision templates. The proposed method does not punish or features with N ˆ < N.
reward the classifiers according to its performance during the
D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 67

Fig. 6. Normalized confusion matrices for the second test.

In the recent years and mainly motivated by the impulse of in the context of fault cause diagnosis in cellular networks, al-
data mining many methods for dimensionality reduction have lowing the expertise from several troubleshooting experts and the
arisen. Within these, it is worth highlighting the Principal knowledge contained in databases of cases previously diagnosed
Component Analysis method (PCA) (Jolliffe, 2002). In an N- to be combined in order to develop a more accurate diagnosis
dimensional vector space, the simplest version of PCA (linear system.
PCA) is a technique that finds the mutually-uncorrelated vec- Unlike the common approach of hybrid ensembles, based on
tors onto which the projection of the samples generates the the majority vote of their baseline components, this work proposes
highest variances. The result is a set of orthogonal vectors a hybrid ensemble of classifiers obtained from the combination
sorted in descending order of achieved variance. The first of of the statistical behavior models of the baseline diagnosis sys-
these vectors is that onto which the variance of the projec- tems. This approach allows obtaining and afterwards combining by
tion of the samples is maximum. In this sense, the original KPIs just applying some algebraic rules the partial diagnoses from the
constitute the N-dimensional vector space basis, whereas the N ˆ standalone classifiers without actually needing them to assess ev-
synthetic KPIs represent the orthogonal vectors with the high- ery case under test, thus reducing the computational cost of usual
est variance. To be rigorous, up to N synthetic orthogonal KPIs hybrid ensembles of classifiers.
may be computed. However, only a small set of them, the first The method has been tested with two different sources of cases
Nˆ , is enough to account for most of the variance of the data. under test: cases provided by an LTE RAN simulator and cases
By applying this technique, based on the eigenvalue decompo- gathered from a real live LTE network. Likewise, two use cases have
sition of the covariance matrix of the original KPIs, these can be been assessed: the combination of diagnosis models designed by
mapped into N ˆ , preserving most of the information contained two different network troubleshooting experts and the combina-
in the former. tion of two diagnosis systems using different learning algorithms.
The proposed method has proved to outperform the behavior of
6. Conclusions its base components in both tests in terms of the diagnosis error
rate, proving to be an effective tool in the fault cause diagnosis in
A hybrid ensemble of classifiers, devised to merge expert current and future self-healing networks.
knowledge from different sources has been presented and assessed
68 D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68

Acknowledgment Gómez-Andrades, A., Muñoz, P., Serrano, I., & Barco, R. (2016). Automatic root cause
analysis for LTE networks based on unsupervised techniques. IEEE Transactions
on Vehicular Technology, 65(4), 2369–2386. doi:10.1109/TVT.2015.2431742.
This work has been partially funded by Optimi-Ericsson, Junta Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991). Adaptive mixtures of
de Andalucía (Consejería de Ciencia, Innovación y Empresa, Ref. local experts. Neural Computing, 3(1), 79–87. doi:10.1162/neco.1991.3.1.79.
59288 and Proyecto de Investigación de Excelencia P12-TIC-2905) Jolliffe, I. (2002). Principal component analysis. Springer Series in Statistics (2nd).
Springer-Verlag New York.
and ERDF. Khatib, E. J., Barco, R., Gómez-Andrades, A., Muñoz, P., & Serrano, I. (2015). Data
mining for fuzzy diagnosis systems in LTE networks. Expert Systems with Appli-
References cations, 42(21), 7549–7559. doi:10.1016/j.eswa.2015.05.031.
Khatib, E. J., Barco, R., Gómez-Andrades, A., & Serrano, I. (2015). Diagnosis based
3GPP (a). Evolved Universal Terrestrial Radio Access (E-UTRA) Radio Resource Con- on genetic fuzzy algorithms for LTE Self-Healing. IEEE Transactions on Vehicular
trol (RRC); Protocol Specification, Rel-13, Version 13.2.0, (2015-12). TS 36.331. Technology. doi:10.1109/TVT.2015.2414296.
3rd Generation Partnership Project. Kittler, J., Hatef, M., Duin, R. P. W., & Matas, J. (1998). On combining classifiers. IEEE
3GPP (b). Feasibility study for Further Advancements for E-UTRA (LTE-Advanced), Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239. doi:10.
Rel-13, Version 13.0.0 (2015–12). TR 36.912. 3rd Generation Partnership Project. 1109/34.667881.
3GPP (c) (May 2004). OFDM-HSDPA System level simulator calibration (R1-040500). Kuncheva, L. (2002). A theoretical study on six classifier fusion strategies. IEEE
3GPP TSG-RAN WG1 37. 3rd Generation Partnership Project (3GPP). Transactions on Pattern Analysis and Machine Intelligence, 24(2), 281–286. doi:10.
3GPP (d). Self-Organizing Networks (SON); Concepts and requirements, Rel-13, Ver- 1109/34.982906.
sion 13.0.0 (2015–12). TS 32.500. 3rd Generation Partnership Project. Liu, H., Chen, G., Song, G., & Han, T. (2009). Analog circuit fault diagnosis using
3GPP (e). Self-Organizing Networks (SON); Self-Healing concepts and requirements, bagging ensemble method with cross-validation. In International conference on
Rel-13, Version 13.0.0 (2015–12). TS 32.541. 3rd Generation Partnership Project. mechatronics and automation, 2009. ICMA 2009 (pp. 4430–4434). doi:10.1109/
Barco, R., Díez, L., Wille, V., & Lázaro, P. (2009). Automatic diagnosis of mobile com- ICMA.2009.5246675.
munication networks under imprecise parameters. Expert Systems with Applica- Mehlführer, C., Wrulich, M., Colom Ikuno, J., Bosanska, D., & Rupp, M. (2009). Sim-
tions, 36(1), 489–500. doi:10.1016/j.eswa.2007.09.030. ulating the long term evolution physical layer. In Proc. of 17th European signal
Barco, R., Lazaro, P., Diez, L., & Wille, V. (2008). Continuous versus discrete model in processing conference (EUSIPCO).
autodiagnosis systems for wireless networks. IEEE Transactions on Mobile Com- Muñoz, P., de la Bandera, I., Ruíz, F., Luna-Ramírez, S., Barco, R., Toril, M., et al.
puting, 7(6), 673–681. doi:10.1109/TMC.2008.23. (2011). Computationally-efficient design of a dynamic system-level LTE simu-
Barco, R., Lázaro, P., Wille, V., Díez, L., & Patel, S. (2009). Knowledge acquisition for lator. International Journal of Electronics and Telecommunications, 57(3), 347–358.
diagnosis model in wireless networks. Expert Systems with Applications, 36(3), doi:10.1155/2012/802606.
4745–4752. doi:10.1016/j.eswa.2008.06.042. Nováczki, S. (2013). An improved anomaly detection and diagnosis framework for
Barco, R., Lázaro, P., & Muñoz, P. (2012). A unified framework for Self-Healing in mobile network operators. In 2013 9th international conference on the design of
wireless networks. IEEE Communications Magazine, 50(12), 134–142. doi:10.1109/ reliable communication networks (drcn) (pp. 234–241).
MCOM.2012.6384463. Shen, H.-B., & Chou, K.-C. (2006). Ensemble classifier for protein fold pattern recog-
Begum, S., Chakraborty, D., & Sarkar, R. (2015). Cancer classification from gene ex- nition. Bioinformatics, 22(14), 1717–1722. doi:10.1093/bioinformatics/btl170.
pression based microarray data using SVM ensemble. In 2015 international con- Szilágyi, P., & Nováczki, S. (2012). An automatic detection and diagnosis framework
ference on condition assessment techniques in electrical systems (CATCON) (pp. 13– for mobile communication systems. IEEE Transactions on Network and Service
16). doi:10.1109/CATCON.2015.7449500. Management, 9(2), 184–197. doi:10.1109/TNSM.2012.031912.110155.
Breiman, L. (1996). Bagging predictors. In Machine learning (pp. 123–140). Wei, H., Lin, X., Xu, X., Li, L., Zhang, W., & Wang, X. (2014). A novel ensemble clas-
Ciocarlie, G., Lindqvist, U., Nováczki, S., & Sanneck, H. (2013). Detecting anomalies in sifier based on multiple diverse classification methods. In 2014 11th interna-
cellular networks using an ensemble method. In Proceedings of the 9th interna- tional conference on fuzzy systems and knowledge discovery (FSKD) (pp. 301–305).
tional conference on network and service management (CNSM 2013) (pp. 171–174). doi:10.1109/FSKD.2014.6980850.
doi:10.1109/CNSM.2013.6727831. Wiezbicki, T., & Ribeiro, E. P. (2016). Sensor drift compensation using weighted neu-
Dasarathy, B., & Sheela, B. V. (1979). A composite classifier system design: con- ral networks. In 2016 IEEE conference on evolving and adaptive intelligent systems
cepts and methodology. Proceedings of the IEEE, 67(5), 708–713. doi:10.1109/ (EAIS) (pp. 92–97). doi:10.1109/EAIS.2016.7502497.
PROC.1979.11321. Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., et al. (2008). Top
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line 10 algorithms in data mining. Knowledge and Information Systems, 14(1), 1–37.
learning and an application to boosting. Journal of Computer and System Sciences, doi:10.1007/s10115- 007- 0114- 2.
55(1), 119–139. doi:10.1006/jcss.1997.1504. Yuksel, S., Wilson, J., & Gader, P. (2012). Twenty years of mixture of experts. IEEE
Gandhi, I., & Pandey, M. (2015). Hybrid ensemble of classifiers using voting. In Transactions on Neural Networks and Learning Systems, 23(8), 1177–1193. doi:10.
2015 international conference on green computing and internet of things (ICGCIoT) 1109/TNNLS.2012.2200299.
(pp. 399–404). doi:10.1109/ICGCIoT.2015.7380496.
Gómez-Andrades, A., Muñoz Luengo, P., Khatib, E., de la Bandera Cascales, I., Ser-
rano, I., & Barco, R. (2015). Methodology for the design and evaluation of
Self-Healing LTE networks. IEEE Transactions on Vehicular Technology, PP(99).
doi:10.1109/TVT.2015.2477945. 1–1

You might also like