Combination of Multiple Diagnosis Systems in Sel - 2016 - Expert Systems With Ap
Combination of Multiple Diagnosis Systems in Sel - 2016 - Expert Systems With Ap
Combination of Multiple Diagnosis Systems in Sel - 2016 - Expert Systems With Ap
a r t i c l e i n f o a b s t r a c t
Article history: The Self-Organizing Networks (SON) paradigm proposes a set of functions to automate network manage-
Received 8 January 2016 ment in mobile communication networks. Within SON, the purpose of Self-Healing is to detect cells with
Revised 20 July 2016
service degradation, diagnose the fault cause that affects them, rapidly compensate the problem with the
Accepted 21 July 2016
support of neighboring cells and repair the network by performing some recovery actions.
Available online 22 July 2016
The diagnosis phase can be designed as a classifier. In this context, hybrid ensembles of classifiers en-
Keywords: hance the diagnosis performance of expert systems of different kinds by combining their outputs. In this
LTE paper, a novel scheme of hybrid ensemble of classifiers is proposed as a two-step procedure: a modeling
Self-healing stage of the baseline classifiers and an application stage, when the combination of partial diagnoses is ac-
Root cause analysis tually performed. The use of statistical models of the baseline classifiers allows an immediate ensemble
Self-organizing networks (SON) diagnosis without running and querying them individually, thus resulting in a very low computational
Hybrid ensemble classifier
cost in the execution stage.
Automatic fault identification
Results show that the performance of the proposed method compared to its standalone components
is significantly better in terms of diagnosis error rate, using both simulated data and cases from a live
LTE network. Furthermore, this method relies on concepts which are not linked to a particular mobile
communication technology, allowing it to be applied either on well established cellular networks, like
UMTS, or on recent and forthcoming technologies, like LTE-A and 5G.
© 2016 The Authors. Published by Elsevier Ltd.
This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
tomatic diagnosis technique and a diagnosis model. The first is the causes are seen as the network states, among which the nor-
an artificial intelligence system that outputs a diagnosis taking mal and several fault states may be distinguished. In this paper,
a set of symptoms, e.g., (KPIs) from a test case as its input. The some results from theory of classifiers is used, extended and ap-
second represents the knowledge a human expert would have plied in this context in an attempt of combining the knowledge
on the underlying relations between the symptoms and the fault acquired by these R diagnosis systems, developing a more reli-
causes and may take different forms depending on the diagnosis able and accurate root cause analysis system for communication
technique it is destined to work with. For example, a diagnosis networks.
model may consist of the parameters (e.g., prior probabilities
and probability density functions) required by a given diagnosis 2.3. State-of-the-art in ensemble-based classification algorithms
technique (e.g., bayesian classifier) or a set of rules for other
techniques (e.g., Case Base Reasoning, CBR). As it can be seen in This section aims to provide a brief survey on the most recently
this figure, the diagnosis model may be built from a set of training proposed ensemble-based systems, most of which have been used
cases by means of a machine learning algorithm or by trou- in classifying tasks in areas not related to mobile communications.
bleshooting experts by gathering their knowledge. The proposed Ensembles of classifiers may nowadays be classified into ho-
method aims to combine the knowledge acquired by any number mogeneous and heterogeneous or hybrid. The first stand for those
and kind of diagnosis models and automatic diagnosis tech- ensembles which put together instances of classifiers of the same
niques in an attempt to reduce the errors in fault detection and kind, e.g., several k-Nearest Neighbor (kNN) classifiers. Conversely,
diagnosis. in heterogeneous ensembles a set of classifiers of different kind are
put together, e.g., a kNN and a NN (Neural Network). This is the
2.2. Automated diagnosis from the classification theory scope of the present work, as the latter also allow the combination
of different sources of expert knowledge within a single enhanced
A diagnosis system is a method that given a set of indicators diagnosis system.
or symptoms (called case hereafter) intends to infer the cause that One of the earliest works on ensemble methods proposed to
provoked them. In this sense, a diagnosis system acts as a classi- partition the feature space (i.e., the vector space in which the fea-
fying system in which the attributes from the cases to be classi- tures of the cases to be diagnosed are defined) and to assign each
fied correspond to the symptoms from the case to be diagnosed, part to a different classifier which is supposed to be the best for
and the classes to be assigned correspond to the causes to be in- this subset of cases (Dasarathy & Sheela, 1979). This idea has been
ferred. This is an issue long time investigated in data mining the- widely explored and has given birth to the so-called mixture of
ory (Wu et al., 2008), and many types of classifiers have been de- experts algorithm (Jacobs, Jordan, Nowlan, & Hinton, 1991; Yuk-
veloped over the years in an attempt to get the maximum infor- sel, Wilson, & Gader, 2012), being the paradigm for the classifier-
mation the cases under diagnosis could provide. However, no algo- selection type of ensemble methods. Under this approach, only one
rithm has proven to be clearly better than the rest for all kinds of classifier is working at the same time and its selection is deter-
input data by now. One reason for the increasing efforts in the re- mined by the partition the case under test belongs.
lated research is that the performance of a classifier normally de- Conversely, in classifier-fusion methods all classifiers are usu-
pends on the nature and distribution of the data it has to work ally trained over the entire feature space. The classifier combina-
with. For this reason, the present paper focuses not only on com- tion process involves merging the individual classifiers to obtain a
bining different diagnosis models but on offering the possibility system that outperforms the standalone classifiers. This is the ba-
to combine multiple classifiers in the form of automatic diagnosis sis for the widely used bagging and boosting predictors (Breiman,
techniques. 1996; Freund & Schapire, 1997), being AdaBoost an example of the
Let us assume we have a set of M fault causes to diagnose and latter and one of the most known and used algorithms for classi-
R diagnosis systems (either diagnosis model or technique) to com- fying nowadays. Classifier fusion methods can also be divided into
bine, and that each of these systems can have a subset of these those which work with classification labels only and those which
causes as their output, namely, Wr for the system r. In this sce- make use of a continuous valued output for each classifier for ev-
nario, the set of causes a diagnosis system can identify may be dif- ery class. In this case, the outputs can be seen as the support an
ferent from one system to another. This can be seen in (1), where expert gives to a class in terms of the class-conditional posterior
each row stands for a Wr and the element wrm stands for the mth probabilities (Kuncheva, 2002).
fault cause, diagnosed by the rth system. According to this, each Some examples of ensemble methods as enhanced systems for
row may be different from another. fault disclosure can be found in the literature with many different
⎡ ⎤ purposes. In Liu et al. (2009), an homogeneous ensemble of neural
w11 ... w1m ··· w1M networks with cross-validation for fault diagnosis of analog circuits
⎢ . .. .. .. .. ⎥ with tolerance is proposed. In Shen and Chou (2006), several kNN
⎢ .. . . ⎥
⎢ . . ⎥ classifiers are put together on a majority-vote ensemble to clas-
⎢ r r ⎥
⎢w1 ... wrm ··· wM ⎥ (1) sify the patterns that several proteins may exhibit when folded. In
⎢ ⎥ Begum et al. (2015), an homogeneous ensemble of SVM (Support
⎢ .. .. .. .. .. ⎥
⎣ . . . . . ⎦ Vector Machine) is proposed to identify different types of cancer
from a genetic analysis. Wiezbicki and Ribeiro (2016) proposes an
wR1 ... wRm ··· wRM
homogeneous ensemble of neural networks, combined by means of
In a diagnosis system, a case, x, is characterized by its symp- a weighted majority vote in a sensor network for the classification
toms, xn , where x = {x1 , x2 , . . . , xN }, having a total of N pos- of gases.
sible symptoms. However, each diagnosis system may consider Regarding the most recent works on hybrid ensembles of clas-
only a subset of these symptoms, namely, Nr for the diagnosis sifiers, in Wei et al. (2014) n ensembles are made up by combin-
system r. ing 3n baseline classifiers. Each ensemble comprises three super-
In the context of diagnosis systems for mobile communica- vised methods: a decision tree, a support vector machine and a
tion networks a case corresponds to an observation or measure- kNN algorithm. In each ensemble, the diagnoses from these base-
ment from the network; a symptom may be an event counter, line classifiers are fused applying a weighted majority vote, where
a Key Performance Indicator (KPI), a call trace or an alarm and each vote is weighted by the performance each individual classifier
D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 59
Fig. 2. Proposed method for combining diagnosis systems. Stage 1: Construction of the behavior models.
shows during a prior training stage. Then, the n resulting diagnoses To the authors’ knowledge, no ensemble method for fault cause
are combined into a final diagnosis applying a non-weighted ma- diagnosis in cellular networks has been proposed as of today.
jority vote. In this case, all the baseline classifiers must be super-
vised diagnosis systems, as their performance must be previously 3. Method for combining multiple automatic diagnosis systems
known in order to weigh their votes in the first stage. Unlike this,
the proposed method allows the user to combine any kind of di- In this section, a method for combining the knowledge ac-
agnosis system, either supervised or unsupervised ones. And even quired by any number and kind of standalone automatic diagnosis
more important, regarding the operation stage, in Wei et al. (2014), systems by means of a classifier-fusion scheme is proposed. The
whenever a new case is to be diagnosed it must pass through two proposed method consists of two stages: the construction of the
steps, one of them made up of 3n systems which must first each behavior models of the automatic diagnosis systems, Section 3.1,
output a diagnosis, resulting in a high computational cost. The and the combination of these models in order to make a more
method in the proposed work, however, needs the test cases to accurate diagnosis on the cases from a testing set, Section 3.2.
be assessed only by one step, which, furthermore, only consist of This can be seen in Figs. 2 and 5. Before this method can be
some algebraic calculations. Once the training stage has been per- applied, two sets of N-dimensional cases must be distinguished:
formed, new cases will be diagnosed at a minimum computational the modeling set and the testing set, where each of these N
cost. dimensions stands for a working KPI. The modeling set will be
As for Gandhi and Pandey (2015), a two-step method is again used in the first stage and the testing set in the second.
proposed. The first step consists of a learning stage for the base
classifiers and the second step consists of a majority vote-based
3.1. Construction of the behavior models
combining stage. Again and similar to Wei et al. (2014), every
baseline classifier is required to first diagnose every new case in
The baseline diagnosis systems are to be combined by means of
the application step, which results in a high computational cost
mixing their models of behavior, which need to be extracted first.
compared to that from the application (test) step in the proposed
Once the diagnosis model from each diagnosis system has been
method.
built (either from training cases via a machine learning method
In the context of cellular networks, Ciocarlie et al. (2013) pro-
(Khatib, Barco, Gómez-Andrades, Muñoz, & Serrano, 2015), or from
poses a hybrid ensemble of classifiers to detect anomalies in the
the experts’ knowledge (Gómez-Andrades et al., 2016) each diag-
performance indicators of a cell. This work is focused on the fault
nosis system can start the classification (Fig. 1). In this stage, every
detection. Unlike this, the proposed work does not just find a per-
case from the modeling set is diagnosed by the R systems. That
formance degradation, but identifies the fault cause behind it. Re-
is, each system assigns to each case one of the M possible fault
garding its implementation, this method relies on the use of a
causes; in particular, one of the causes that system can discern.
pool of models. New models are added to this pool whenever a
This can be seen in Fig. 2, where the case x acts as the input
change in the configuration
parameters of the network takes place.
U +N G
for the R systems and, in turn, they assign it R diagnosis labels.
A number of NCM × Nunivariate × NKPI multivariate × NKPI models, If the system r diagnoses the case x with the cause m, this case
and thus, instances of automatic techniques must be assessed for
receives the label wr∗
m . In this way, each diagnosis system makes a
every single new case under test. In this expression, NCM stands for
different partition of the modeling set into |Wr∗ | disjoint subsets,
the number of sets of network configuration parameters consid-
U
whose maximum is |Wr |, that is, the number of causes that system
ered; NKPI and Nunivariate stand for the number of univariate tech-
considers (Fig. 3), where |A| is the number of elements in the set
niques considered and the number of KPIs acting as their input in
G stand for the number of mul-
A. This leads to finally identify M∗ different causes, being M∗ the
each model; and Nmultivariate and NKPI
union of Wr∗ over r, with M∗ ≤ M. According to this, a new matrix
tivariate techniques used and the number of groups of KPIs con-
from (1) may be written, substituting every row (i.e., every Wr ) by
sidered in each model. Like in Ciocarlie et al. (2013), Wei et al.
its corresponding Wr∗ . Each row would represent one of the parti-
(2014) and Gandhi and Pandey (2015), before an ensemble decision
tions of the modeling set and each column would represent how
can be made, a high number of baseline classifiers must be first
a cause “is seen” by each diagnosis system regarding the KPIs the
queried. And again similarly to Ciocarlie et al. (2013), according to
cases belonging to that wrm exhibit.
Wei et al. (2014) and Gandhi and Pandey (2015), all the partial de-
It should be noticed that each of these M∗ subsets contains a
cisions meet at a combining stage based in a weighted majority
number of |Nr |-dimensional cases. At this point, the behavior of
vote.
the diagnosis system r is modeled through the estimation of the
60 D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68
KPIs, usually limited between zero and one (Barco, Lazaro, Diez,
& Wille, 2008). KPIs like the retainability or the accessibility of-
ten reach these extreme values making the resulting fitted beta
present asymptotes in these values. To avoid this issue the used
beta function β is slightly different from that from Table 1, β . In
this case,
Fig. 3. Modeling set divided into different subsets by means of two different par- where β (x) stands for the distribution fitted to a set with no ex-
titions: on the left, the partition the first diagnosis system makes, having W 1 = treme values; P0 and P1 stand for the relative frequency of cases
{w11 , w12 , w13 } with |W 1 | = |W 1∗ | = 3; on the right, the partition the diagnosis sys- with value 0 and 1 respectively; δ stands for the Dirac’s delta and
tem R makes, having W R = {wR1 , wR2 , wR3 } and W R∗ = {wR1∗ , wR2∗ }. In this last case, the
hβ stands for the step (the resolution) when computing β . This
diagnosis system R only diagnosed the causes 1 and 2 although being able of also
identifying the fault cause 3. can be seen in Fig. 4b, where a normalized histogram for the KPI
retainability is shown.
Table 1
Families of PDFs considered for the estimation of p(xn |wr∗
m ). 3.2. Combination of behavior models
Distribution PDF Parameters
This stage uses the cases from the testing set. In the previous
(a+b) a−1
Beta (a )(b) x (1 − x )b−1 a, b
stage, the estimated functions have been seen as conditional prob-
x −μ 2
Normal √1 exp − 12 σ μ, σ ability density functions, that is, functions that express how the
σ 2π
2
KPIs are distributed over the cases diagnosed with a given cause
1 ln(x−μ )
Log-normal xσ
1
√
2π
exp − 2 σ μ, σ by a given system. However, this set of functions may be seen as
Exponential λ exp(−λx ) λ likelihood functions by just changing the approach. From this point
ξ +1 exp (−t (x )), μ, σ , ξ of view, the function depends on wr∗ m given that an observation of
σ t (x )
1
Gen. extreme value
x −μ − ξ1
the random variable xn (that is, the nth KPI) has taken place.
t (x ) = 1+ σ ξ ξ = 0 Now, assuming the KPIs are independent among each other, a
exp(−(x − μ )/σ ) ξ = 0
m , that is, p(x|wm ), may be written
joint probability function of wr∗ r∗
( ν +1 ) x −μ 2
− ν +1
2
as
T-location √2
( ν2 ) π νσ
1 + ν1 σ ν , μ, σ
Nakagami 2mm
(m )m x2m−1 exp − x m 2
m, p(x|wr∗
m) = p(xn |wr∗
m ). (3)
n∈N r
Gamma 1
(k )θ k
xk−1 exp − θx k, θ
exp ( x−sμ ) Given (3), and assuming that the prior probability of each cause,
Logistic μ, s |wr∗
( ( )) m|
s 1+exp − x−sμ
P (wr∗m ) is given by |W r∗ | , the a posteriori probability for a diagnosis
2
(β /α )(x/α )β −1
Log-logistic α, β system r to diagnose a case with the cause m given its KPIs are x
(1+(x/αβ ) )
2
k−1 k (i.e., P (wrm |x )) can be calculated by just applying the Bayes’ theo-
Weibull k x
λ λ exp − λx λ, k
rem. That is,
Rayleigh x
σ 2 exp − 2
1 x 2
σ ⎧
⎨ p(x|wr∗ m )P ( wm )
σ r∗
xν P (wr∗
x +ν
ν, σ if m) > 0
wr∗ ∈W r∗ p(x|wi )P (wi )
2 2
P ( wm |x ) =
x
σ 2 exp − 2σ 2
Rice I0 σ 2 r r∗ r∗
(4)
⎩ i
0 if P (wr∗
m) = 0
statistical distributions of the Nr KPIs for the cases belonging to At this point, some diagnosis system may have not diagnosed
Wr∗ . That is, the behavior of each diagnosis system is modeled by a given cause as seen in Fig. 3. In such case, P (wr∗ m ) and thus
means of Nr × M∗ PDFs. The estimated statistical distribution of (4) would result equal to zero. In any case, M × R a posteriori
the nth KPI for the subset of cases diagnosed as m by the diag- probabilities may be distinguished. Fig. 5 shows this when a case y
nosis system r is p(xn |wr∗ m ). The choice of the PDF that estimates
from the testing subset is to be diagnosed. As it can be seen in this
each one of these distributions is done according to the maximum figure, the KPIs from the case y act as input values in the behav-
likelihood (ML) criterion. To do so, some families of PDFs are con- ior models of the R diagnosis systems, i.e., the probability functions
sidered in the fitting procedure (Table 1). In a first step, the distri- p(y|wr∗m ) for wm ∈ W
r∗ r∗ and r = 1, . . . , R. Then, the a posteriori prob-
bution of the KPI xn from the cases labeled as wr∗ m is fitted attend-
abilities P (wrm |y ) are computed using these together with P (wr∗ m)
ing to the ML criterion with each one of the considered families by means of the Bayes’ theorem.
of PDFs. This results in a set of candidates for estimating its dis- Now, these M × R a posteriori probabilities together with the
tribution. These PDFs are then sorted by their likelihood and the prior probabilities can be combined over R using some algebraic
one with the maximum value is chosen to be the estimation for functions, producing M probabilities of the kind P (wm |y )Rulet per
the KPI. function used, where m again stands for the cause and t is an index
The reason for considering these families of PDFs is to get the for the rule used in the combination, that is,
better estimation of the distribution of the KPI xn given its belong-
P (wm |y )Rulet = fRulet P (w1m |y ), . . . , P (wRm |y ); P (wm ) . (5)
ing to wr∗m . Fig. 4a shows a normalized histogram of the KPI “95th
percentile RSRP” from the cases labeled as wr∗ m . In this figure, two where P(wm ) is defined as the average of P (wr∗
m ) over r.
families of PDFs have been used in an attempt of fitting the under- Some rules for the combination of a posteriori probabilities
lying histogram, the normal and the generalized extreme value. As given by several classifying systems are proposed in Kittler, Hatef,
it can be seen, the latter fits it better, resulting in a higher value Duin, and Matas (1998) and studied further in Kuncheva (2002).
in a likelihood-ratio test. In the first, those rules are derived from a maximum a posteriori
While some KPIs are counters and they do not have an up- (MAP) estimation in a multiple random variable scenario in an at-
per limit, there are others that are inherently bounded, as they tempt of lightening the efforts of computing several joint probabil-
are defined as a ratio. Normally, the beta PDF is used to fit these ity density functions. These rules are summarized in Table 2.
D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 61
Fig. 4. (a) Normalized histogram for the KPI 95th percentile RSRP and two fitted PDFs: a generalized extreme value PDF in blue (round markers) and a normal PDF in red
(square markers). (b), Normalized histogram for the KPI Retainability and a β PDF estimation. (For interpretation of the references to colour in this figure legend, the reader
is referred to the web version of this article.)
Fig. 5. Proposed method for combining diagnosis systems. Stage 2: Combining the behavior models.
As this point, the fault cause with the maximum a posteriori 4. Proof of concept
probability is taken as the final diagnosis per each rule of combi-
nation, dt . That is, In this section, the proposed method is assessed by combin-
ing two different diagnosis models. In the first test, each model
is provided by a different expert; in the second test, each model
dRulet = arg max{P (wm |y )Rulet }. (6) comes from using different machine learning algorithms for build-
m
ing the diagnosis models, provided the same set of training
cases.
Note that a situation with M∗ < M means that there is at least The proposed method has been evaluated and compared to the
one fault cause that have not been identified by any system. In baseline systems by means of the following figures of merit:
this case, it would be impossible for it to be finally diagnosed in
consequence.
62 D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68
Table 3
Simulation parameters for cells normal functioning.
Parameter Configuration
Table 4 Table 5
Parameters used for modeling fault causes in Section 4.1 and a priori Diagnosis models for the diagnosis sys-
probabilities for each cause. tems used in test 1: used thresholds.
Excessive downtilt Downtilt = [16, 15, 14] ° 0.18 Retainability [0.973, 0.996]
Coverage hole hole = [49, 50, 52, 53] dBm 0.09 HOSR [0.899, 0.989]
Inter-system interf. PT Xmax = 33 dBm 0.1 RSRP [dBm] [−76.9, −72.4]
Downtilt = 15 ° RSRQ [dB] [−18.8, −18.2]
Azimuth beamwidth = [30, 60] ° SINR [dB] [13, 14.5]
Elevation beamwidth = 10 ° Throughput [kbps] [96.2, 111.67]
Too late HO HOM = [6, 7, 8] dBm 0.23 Distance [km] [0.838, 0.88]
Excessive uptilt Downtilt = [0, 1] ° 0.21
Lack of coverage PT Xmax = [7, 8, 9, 10] dBm 0.19
• Retainability, given as a percentage. This performance indicator 4.1.2. The standalone classifiers
quantifies the ability of the cell to hold the service once ac- In this test, for a given technique of automatic diagnosis, two
cepted by the admission control. It gives an idea on how often diagnosis models are combined, R = 2, where each of them is pro-
a user experiences a call drop. vided by a different expert. This test represents the usual case
• Handover success rate (HOSR), given as a percentage. This KPI in cellular networks where each troubleshooting expert defines
measures the ability of the network to provide mobility to a his own set of rules and KPI thresholds to identify problems.
user without losing its connection. It can be calculated as the When deploying the diagnosis system in a network, according to
ratio between the number of successful handovers and the total the proposed method, instead of choosing one single model, the
number of HO. knowledge from both experts is fused by combining two diagno-
• 95th percentile RSRP, given in dBm. The Reference Signal Re- sis models. Furthermore, both diagnosis models comprise the six
ceived Power (RSRP) is defined as the linear average over the fault causes and the seven different KPIs described above. That is,
power contributions (in [W]) of the resource elements that W 1 = W 2 with |W 1 | = M and N 1 = N 2 .
carry cell-specific reference signals within the considered mea- The artificial intelligence technique used for these tests is based
surement frequency bandwidth. on a Fuzzy Logic Controller (FLC) (Khatib, Barco, Gómez-Andrades,
• 5th percentile RSRQ, given in dB. The Reference Signal Received & Serrano, 2015). This system contains rules, which are com-
Quality (RSRQ) is a signal quality indicator and is defined as the posed of the antecedent (the “if ...” part) and the consequent (the
ratio “then ...” part), being the last the cause the fuzzy logic controller
NPRB · RSRP assigns to a case if the antecedent is fulfilled attending to the
RSRQ = , (8)
RSSI fuzzyfied observable features of the case. On the one hand, Table 5
where NPRB is the number of resource blocks of the E-UTRA car- shows the thresholds used in both diagnosis models. The lower
rier RSSI measurement bandwidth and RSSI stands for the to- limit stands for the value below which a KPI is considered to be
tal received power within the measurement bandwidth. This is, low; the upper limits stands for the value above which a KPI is
considering the power from the serving cell, the power of the considered to be high. On the other hand, Table 6 shows the if ...,
co-channel serving and non-serving cells, the adjacent chan- then ... rules that make up each diagnosis model, given by each
nel interference and any possible source of noise. In this paper, expert. From left to right, each column below “KPI” in Table 6 cor-
RSRQ is expressed in dB. responds to the KPIs shown in Table 5. H stands for a high value
• 95th percentile SINR, given in dB. The Signal-to-Interference- in that KPI and L for a low value. Regarding the numbering of the
plus-Noise Ratio (SINR) is defined as the ratio between the diagnoses, 1 means excessive downtilt; 2: coverage hole; 3: inter-
power of the desired data signal and the sum of the powers system interference; 4: too late handover; 5: excessive uptilt and
of all inter-cell interferences and the noise. It is expressed in 6: lack of coverage.
dB.
• 95th percentile distance, given in km. This KPI measures the dis- 4.1.3. Results
tance between users and their serving cell, expressed in km. It Table 7 shows the diagnosis error rates computed when the
can be estimated attending to the transmission delay between Max rule is used for combining (Table 2). In Table 7, the aver-
them and gives an idea of the cell coverage area. age diagnosis error rate and the rate of improvement are shown.
• Average throughput, given in kbps. In LTE systems, the user This last rate represents the amount of repetitions (among the 50
throughput depends on the SINR experienced by the user that have been performed) in which the diagnosis error rate from
64 D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68
Table 6
Diagnosis models for the diagnosis systems used in test 1: used rules.
L L H L – H L 1 – – H L – H L 1
H H – L L H L 1 L – H H – L H 2
L – – H H – H 2 L – H H H – H 2
L – – H L L H 3 L – – H L L H 3
L L H – L L H 3 L – H – L L H 3
– – H H H H – 4 L – H H L L – 3
– H H – H H – 4 L H – H L L – 3
H – H – H H – 4 – H H H L L H 3
– – H – H H H 4 L L – – – H H 4
H H – – H – H 4 L L – L – – H 4
H H – L – – H 4 L L H – H L – 4
H H – – – H H 4 L L H – H – H 4
H H H – – – H 4 L L L L L L – 4
– – H L H – H 4 – – L H – L H 5
– – H L – L H 4 H H – H – L H 5
L L H L – – H 4 H H L – L L H 5
– – L – L L H 5 – – – L L H L 6
– – L – L H L 6
L – – L L H L 6
– H – L L H L 6
H H – – L H L 6
L L L L L – L 6
Table 7 Table 8
Results of test 1: Combining two versions of the same classifying algo- Main parameters of the real LTE network used in test two.
rithm.
Parameter Configuration
Modeling-to-testing ratio
Network Layout Urban area
25% 50% 75% Number of cells 8679
System bandwidth 10 MHz
Diagnosis syst. 1, average DER 13.81% 13.7% 13.65%
Number of PRBs 50
Diagnosis syst. 2, average DER 16.34% 16.13% 16.3%
Frequency reuse factor 1
Ens. Method: Max rule average DER 8.29% 5.92% 5.34%
Max. Transmitted Power 46 dBm
Rate of improvement 60% 98% 100%
Max. Transmitted Power of UE 23 dBm
Horizontal HPBW (Half-Power Beam Width) 65°
HOM 3 dB
the ensemble method is lower than the best one provided by the
KPI Time Period Hourly
baseline diagnosis systems. With a 25% of modeling-to-testing ra- Number of observed cells 45
tio only 60% of the iterations shows a better ensemble diagnosis Number of days under observation 6 days per cell (on average)
error rate than the ones from its base diagnosis systems, showing, Size of the dataset 14,692 labeled cases
therefore, little improvement in the average diagnosis error rate.
This result highlights how the scarcity of cases for modeling im-
pacts on the classifying performance of the ensemble. However, if work has been performed. In this test, the diagnosis models built
the number of cases used for modeling is doubled 98% of the iter- from two different machine learning algorithms have been com-
ations shows a better diagnosis error rate, which results also in a bined.
lower average diagnosis error rate. In case the modeling-to-testing
ratio is set to 75% every diagnosis error rate provided by the en-
semble method is lower than the lowest provided by its compo- 4.2.1. Scenario
nents, reaching a 5.34% on average. This means a DER of approxi- An LTE network composed of more than 80 0 0 different cells
mately 1/3 the lowest DER achieved by the standalone classifiers. providing coverage to almost 4 million people has been analyzed.
Regarding the DER of the standalone diagnosis systems, it can Its vastness makes many different cells to coexist and also a wide
be seen how these are held over the modeling-to-testing ratio. variety of problematic causes to come up. Table 8 summarizes the
This is because of the randomizing process executed over the la- main parameters of the network. Among all the available candi-
beled cases to be divided into the modeling and testing subsets. dates, 45 random cells have been chosen to represent the network
When this random permutation is performed a number of times behavior. These cells have been monitored for almost 6 days on
and some subsets (two, in this case) are chosen blindly from this average and their KPIs have been stored in an hourly basis. Tak-
set, the averages of the amount of cases labeled with a given cause ing into account that the state of a single cell varies substantially
in each of these subsets tend to the ratios of the labels from the throughout the day due to the traffic fluctuation, several cases have
original set. This is a consequence of the law of large numbers. For been stored from each cell at different hours, resulting in a total of
this reason, the resulting averaged DER of these baseline systems 14,692 cases. Once these cases were gathered, they were all labeled
is independent on the size of the subsets made from the original by the experts, distinguishing four groups of cases (M = 4): three
set of cases. kinds of problematic patterns and the normal cell functioning. The
causes of malfunctioning that were found are:
4.2. Combination of different diagnosis systems on a live network
• Overload: This fault cause is mainly distinguished by a high
Once the proposed method has been tested with cases provided number of RRC connections in the cell, which makes the CPU
by a simulator, a second test with cases from a real live LTE net- processing load and the number of HO attempts raise conse-
D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 65
Table 9 Table 10
Prior probability of occurrence for the causes considered in test two, Diagnosis models for the diagnosis systems used
P(wm ). in test 2: used thresholds.
Table 11
Diagnosis models for the diagnosis systems used in test 2: used rules.
H H – – L L 1 H H L – L L 1
H – H H L L 1 H H – H L L 1
– H H L – L 2 H – H H L L 1
H – – L – H 2 – L H L – H 2
H – – L H – 2 – H H – H H 2
H – H L – – 2 – – H L H H 2
– L H L – H 2 L – H – H H 2
H H – – H – 2 H H H – H – 2
– H H – H – 2 H L – L L H 2
– – H L H – 2 H – H L – H 2
H – H – H H 2 H – H L H – 2
H – H – H H 2 – L L L L L 3
– L L L L L 3 L L L L H – 4
L L – – H H 4 – L L L H H 4
L L L – H – 4 L L L – H H 4
L L H L L L 4 L – L L H H 4
L L L – – H 4
L – L – H H 4
– – L L H H 4
L L L H – – 4
Table 12 training stage. Going a step further from the idea of the
Results of test 2: Combining two different algorithms.
weighted majority vote used in Wei et al. (2014), a score system
DER FPR FNR OER based on class and classifier aware decision templates applied
over the a posteriori probabilities P (wrm |x ) from Eq. (4) could
Training: Data driven algorithm 2.62% 16.91% 6.47% 11.43%
Training: Genetic algorithm 1.87% 16.61% 2.68% 8.16%
be used to improve the overall accuracy.
• Non-parametric PDFs. The proposal of analytically and
Ensemble method
Product rule 2.6% 12.21% 1.32% 6.2% parameter-defined PDFs results in a really light way of
Sum rule 1.78% 11.55% 1.25% 5.59% representing a statistical behavior, as only its parameters must
Max rule 1.78% 11.51% 1.25% 5.57% be stored Table 1 to model a diagnosis system. However,
Min rule 2.05% 11.42% 1.4% 5.84% these distributions may limit to some extent the statistical
Median rule 1.78% 10.67% 1.34% 5.39%
representation of the features from the training cases and they
Majority vote rule 1.78% 11.23% 1.25% 5.49%
may eventually introduce a source of error in the posterior
computation of P (wrm ) in case these cases follow a distribu-
in the worst case in unnoticed service outages and degradation tion that has not been considered. To solve this, the future
in the network performance. Regarding this, the proposed method research could focus on using non-parametric PDFs, like the
has proved to successfully reduce the FNR. Other indicators are not kernel-based ones.
as critical. For example, misleading a fault cause with another may Probability density functions may be classified into parametric
be to some extent tolerable (DER); although the actual problem is and non-parametric functions. The former have analytic expres-
not that one the operator thinks it is, he is still aware of a problem sions and their shape depends on the parameters those func-
in the network. Even considering normal cases as faulty may be tions hold. The latter, however, are defined by means of a ker-
tolerable as the network performance is not really degraded (FPR). nel function. If all the cases from the dataset are placed along
These results can also be seen in the normalized confusion the axis given by a feature of interest (a certain KPI, for exam-
matrices from the diagnosis methods. Fig. 6a shows the normal- ple) and a kernel function is centered wherever a point is, an
ized confusion matrix for the FLC using genetic algorithm for rule empirical non-parametric PDF would result from averaging the
learning; Fig. 6b shows the confusion matrix given the data driven sum of these functions over the number of cases. The main ad-
algorithm was used for learning the rules and Fig. 6c shows the vantage of this method is its accuracy when modeling an em-
matrix from applying the median rule with a 60% of modeling- pirical distribution. Its main drawback is that, since it is not
to-testing ratio in the ensemble method. In these matrices, the defined by any parameters, it should be computed and stored
elements from the fourth column (excluding the main diagonal) point by point, possibly increasing the storage and comput-
account for the false negatives and the elements from the fourth ing requirements. This method, however, may be used together
row account for the false negatives. It can be seen how the with (c). First, a reduced set of synthetic KPIs is computed and
elements from the main diagonal are reinforced in the ensemble then, their PDFs are accurately estimated with this method.
method and how only those diagnoses which are mistaken by • Use of synthetic KPIs via feature extraction. As it is described in
both baseline systems are slightly inherited by the latter. Fig. 6c Section 3.1, Nr × M∗ × R PDFs should be estimated in order
also shows graphically how the FPR and FNR dropped with respect to model all the feature-class-classifier relations. If any of these
to those from the standalone systems. factors is relatively high, the computing cost for all these PDFs
to be computed could be prohibitive. Due to this, working with
5. Future lines of work a reduced group of synthetic/extracted features is proposed in
an attempt of mapping the N original features into N ˆ synthetic
• Decision templates. The proposed method does not punish or features with N ˆ < N.
reward the classifiers according to its performance during the
D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 67
In the recent years and mainly motivated by the impulse of in the context of fault cause diagnosis in cellular networks, al-
data mining many methods for dimensionality reduction have lowing the expertise from several troubleshooting experts and the
arisen. Within these, it is worth highlighting the Principal knowledge contained in databases of cases previously diagnosed
Component Analysis method (PCA) (Jolliffe, 2002). In an N- to be combined in order to develop a more accurate diagnosis
dimensional vector space, the simplest version of PCA (linear system.
PCA) is a technique that finds the mutually-uncorrelated vec- Unlike the common approach of hybrid ensembles, based on
tors onto which the projection of the samples generates the the majority vote of their baseline components, this work proposes
highest variances. The result is a set of orthogonal vectors a hybrid ensemble of classifiers obtained from the combination
sorted in descending order of achieved variance. The first of of the statistical behavior models of the baseline diagnosis sys-
these vectors is that onto which the variance of the projec- tems. This approach allows obtaining and afterwards combining by
tion of the samples is maximum. In this sense, the original KPIs just applying some algebraic rules the partial diagnoses from the
constitute the N-dimensional vector space basis, whereas the N ˆ standalone classifiers without actually needing them to assess ev-
synthetic KPIs represent the orthogonal vectors with the high- ery case under test, thus reducing the computational cost of usual
est variance. To be rigorous, up to N synthetic orthogonal KPIs hybrid ensembles of classifiers.
may be computed. However, only a small set of them, the first The method has been tested with two different sources of cases
Nˆ , is enough to account for most of the variance of the data. under test: cases provided by an LTE RAN simulator and cases
By applying this technique, based on the eigenvalue decompo- gathered from a real live LTE network. Likewise, two use cases have
sition of the covariance matrix of the original KPIs, these can be been assessed: the combination of diagnosis models designed by
mapped into N ˆ , preserving most of the information contained two different network troubleshooting experts and the combina-
in the former. tion of two diagnosis systems using different learning algorithms.
The proposed method has proved to outperform the behavior of
6. Conclusions its base components in both tests in terms of the diagnosis error
rate, proving to be an effective tool in the fault cause diagnosis in
A hybrid ensemble of classifiers, devised to merge expert current and future self-healing networks.
knowledge from different sources has been presented and assessed
68 D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68
Acknowledgment Gómez-Andrades, A., Muñoz, P., Serrano, I., & Barco, R. (2016). Automatic root cause
analysis for LTE networks based on unsupervised techniques. IEEE Transactions
on Vehicular Technology, 65(4), 2369–2386. doi:10.1109/TVT.2015.2431742.
This work has been partially funded by Optimi-Ericsson, Junta Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991). Adaptive mixtures of
de Andalucía (Consejería de Ciencia, Innovación y Empresa, Ref. local experts. Neural Computing, 3(1), 79–87. doi:10.1162/neco.1991.3.1.79.
59288 and Proyecto de Investigación de Excelencia P12-TIC-2905) Jolliffe, I. (2002). Principal component analysis. Springer Series in Statistics (2nd).
Springer-Verlag New York.
and ERDF. Khatib, E. J., Barco, R., Gómez-Andrades, A., Muñoz, P., & Serrano, I. (2015). Data
mining for fuzzy diagnosis systems in LTE networks. Expert Systems with Appli-
References cations, 42(21), 7549–7559. doi:10.1016/j.eswa.2015.05.031.
Khatib, E. J., Barco, R., Gómez-Andrades, A., & Serrano, I. (2015). Diagnosis based
3GPP (a). Evolved Universal Terrestrial Radio Access (E-UTRA) Radio Resource Con- on genetic fuzzy algorithms for LTE Self-Healing. IEEE Transactions on Vehicular
trol (RRC); Protocol Specification, Rel-13, Version 13.2.0, (2015-12). TS 36.331. Technology. doi:10.1109/TVT.2015.2414296.
3rd Generation Partnership Project. Kittler, J., Hatef, M., Duin, R. P. W., & Matas, J. (1998). On combining classifiers. IEEE
3GPP (b). Feasibility study for Further Advancements for E-UTRA (LTE-Advanced), Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239. doi:10.
Rel-13, Version 13.0.0 (2015–12). TR 36.912. 3rd Generation Partnership Project. 1109/34.667881.
3GPP (c) (May 2004). OFDM-HSDPA System level simulator calibration (R1-040500). Kuncheva, L. (2002). A theoretical study on six classifier fusion strategies. IEEE
3GPP TSG-RAN WG1 37. 3rd Generation Partnership Project (3GPP). Transactions on Pattern Analysis and Machine Intelligence, 24(2), 281–286. doi:10.
3GPP (d). Self-Organizing Networks (SON); Concepts and requirements, Rel-13, Ver- 1109/34.982906.
sion 13.0.0 (2015–12). TS 32.500. 3rd Generation Partnership Project. Liu, H., Chen, G., Song, G., & Han, T. (2009). Analog circuit fault diagnosis using
3GPP (e). Self-Organizing Networks (SON); Self-Healing concepts and requirements, bagging ensemble method with cross-validation. In International conference on
Rel-13, Version 13.0.0 (2015–12). TS 32.541. 3rd Generation Partnership Project. mechatronics and automation, 2009. ICMA 2009 (pp. 4430–4434). doi:10.1109/
Barco, R., Díez, L., Wille, V., & Lázaro, P. (2009). Automatic diagnosis of mobile com- ICMA.2009.5246675.
munication networks under imprecise parameters. Expert Systems with Applica- Mehlführer, C., Wrulich, M., Colom Ikuno, J., Bosanska, D., & Rupp, M. (2009). Sim-
tions, 36(1), 489–500. doi:10.1016/j.eswa.2007.09.030. ulating the long term evolution physical layer. In Proc. of 17th European signal
Barco, R., Lazaro, P., Diez, L., & Wille, V. (2008). Continuous versus discrete model in processing conference (EUSIPCO).
autodiagnosis systems for wireless networks. IEEE Transactions on Mobile Com- Muñoz, P., de la Bandera, I., Ruíz, F., Luna-Ramírez, S., Barco, R., Toril, M., et al.
puting, 7(6), 673–681. doi:10.1109/TMC.2008.23. (2011). Computationally-efficient design of a dynamic system-level LTE simu-
Barco, R., Lázaro, P., Wille, V., Díez, L., & Patel, S. (2009). Knowledge acquisition for lator. International Journal of Electronics and Telecommunications, 57(3), 347–358.
diagnosis model in wireless networks. Expert Systems with Applications, 36(3), doi:10.1155/2012/802606.
4745–4752. doi:10.1016/j.eswa.2008.06.042. Nováczki, S. (2013). An improved anomaly detection and diagnosis framework for
Barco, R., Lázaro, P., & Muñoz, P. (2012). A unified framework for Self-Healing in mobile network operators. In 2013 9th international conference on the design of
wireless networks. IEEE Communications Magazine, 50(12), 134–142. doi:10.1109/ reliable communication networks (drcn) (pp. 234–241).
MCOM.2012.6384463. Shen, H.-B., & Chou, K.-C. (2006). Ensemble classifier for protein fold pattern recog-
Begum, S., Chakraborty, D., & Sarkar, R. (2015). Cancer classification from gene ex- nition. Bioinformatics, 22(14), 1717–1722. doi:10.1093/bioinformatics/btl170.
pression based microarray data using SVM ensemble. In 2015 international con- Szilágyi, P., & Nováczki, S. (2012). An automatic detection and diagnosis framework
ference on condition assessment techniques in electrical systems (CATCON) (pp. 13– for mobile communication systems. IEEE Transactions on Network and Service
16). doi:10.1109/CATCON.2015.7449500. Management, 9(2), 184–197. doi:10.1109/TNSM.2012.031912.110155.
Breiman, L. (1996). Bagging predictors. In Machine learning (pp. 123–140). Wei, H., Lin, X., Xu, X., Li, L., Zhang, W., & Wang, X. (2014). A novel ensemble clas-
Ciocarlie, G., Lindqvist, U., Nováczki, S., & Sanneck, H. (2013). Detecting anomalies in sifier based on multiple diverse classification methods. In 2014 11th interna-
cellular networks using an ensemble method. In Proceedings of the 9th interna- tional conference on fuzzy systems and knowledge discovery (FSKD) (pp. 301–305).
tional conference on network and service management (CNSM 2013) (pp. 171–174). doi:10.1109/FSKD.2014.6980850.
doi:10.1109/CNSM.2013.6727831. Wiezbicki, T., & Ribeiro, E. P. (2016). Sensor drift compensation using weighted neu-
Dasarathy, B., & Sheela, B. V. (1979). A composite classifier system design: con- ral networks. In 2016 IEEE conference on evolving and adaptive intelligent systems
cepts and methodology. Proceedings of the IEEE, 67(5), 708–713. doi:10.1109/ (EAIS) (pp. 92–97). doi:10.1109/EAIS.2016.7502497.
PROC.1979.11321. Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., et al. (2008). Top
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line 10 algorithms in data mining. Knowledge and Information Systems, 14(1), 1–37.
learning and an application to boosting. Journal of Computer and System Sciences, doi:10.1007/s10115- 007- 0114- 2.
55(1), 119–139. doi:10.1006/jcss.1997.1504. Yuksel, S., Wilson, J., & Gader, P. (2012). Twenty years of mixture of experts. IEEE
Gandhi, I., & Pandey, M. (2015). Hybrid ensemble of classifiers using voting. In Transactions on Neural Networks and Learning Systems, 23(8), 1177–1193. doi:10.
2015 international conference on green computing and internet of things (ICGCIoT) 1109/TNNLS.2012.2200299.
(pp. 399–404). doi:10.1109/ICGCIoT.2015.7380496.
Gómez-Andrades, A., Muñoz Luengo, P., Khatib, E., de la Bandera Cascales, I., Ser-
rano, I., & Barco, R. (2015). Methodology for the design and evaluation of
Self-Healing LTE networks. IEEE Transactions on Vehicular Technology, PP(99).
doi:10.1109/TVT.2015.2477945. 1–1