+ Main Research Streams On MCS - A Survey of Multiple Classifier Systems As Hybrid Systems 2014
+ Main Research Streams On MCS - A Survey of Multiple Classifier Systems As Hybrid Systems 2014
Information Fusion
journal homepage: www.elsevier.com/locate/inffus
a r t i c l e i n f o a b s t r a c t
Article history: A current focus of intense research in pattern classification is the combination of several classifier sys-
Available online 29 April 2013 tems, which can be built following either the same or different models and/or datasets building
approaches. These systems perform information fusion of classification decisions at different levels over-
Keywords: coming limitations of traditional approaches based on single classifiers. This paper presents an up-to-
Combined classifier date survey on multiple classifier system (MCS) from the point of view of Hybrid Intelligent Systems.
Multiple classifier system The article discusses major issues, such as diversity and decision fusion methods, providing a vision of
Classifier ensemble
the spectrum of applications that are currently being developed.
Classifier fusion
Hybrid classifier
Ó 2013 Elsevier B.V. All rights reserved.
1566-2535/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.inffus.2013.04.006
4 M. Woźniak et al. / Information Fusion 16 (2014) 3–17
Historical perspective. The concept of MCS was first presented by on three well known academic search sites. The growth in the num-
Chow [4], who gave conditions for optimality of the joint decision1 ber of publications has an exponential trend. The last entry of the
of independent binary classifiers with appropriately defined weights.
In 1979 Dasarathy and Sheela combined a linear classifier and one k-
NN classifier [6], suggesting to identify the region of the feature
space where the classifiers disagree. The k-NN classifier gives the an-
swer of the MCS for the objects coming from the conflictive region
and by the linear one for the remaining objects. Such strategy signif-
icantly decreases the exploitation cost of whole classifier system.
This was the first work introducing a classifier selection concept,
however the same idea was developed independently in 1981 by
Rastrigin and Erenstein [7] performing first a feature space partition-
ing and, second, assigning to each partition region an individual clas-
sifier that achieves the best classification accuracy over it. Other
early relevant works formulated conclusions regarding MCS ’s classi-
fication quality, such as [8] who considered a neural network ensem-
ble, [9] with majority voting applied to handwriting recognition,
Turner in 1996 [10] showed that averaging outputs of an infinite
number of unbiased and independent classifiers can lead to the same
response as the optimal Bayes classifier, Ho [11] underlined that a
decision combination function must receive useful representation
of each classifier’s decision. Specifically, they considered several
method based on decision ranks, such as Borda count. Finally, the
landmark works devoted introducing bagging [12] and boosting
[13,14] which are able to produce strong classifiers [15], in the (Prob-
ably Approximately Correct) theory [16] sense, on the basis of the
weak one. Nowadays MCS, are highlighted by review articles as a
hot topic and promising trend in pattern recognition [17–21]. These
reviews include the books by Kuncheva [22], Rokach [23], Seni and
Edler [24], and Baruque and Corchado [25]. Even leading-edge gen-
eral machine learning handbooks such as [26–28] include extensive
presentations of MCS concepts and architectures. The popularity of
this approach is confirmed by the growing trend in the number of
publications shown in Fig. 2. The figure reproduces the evolution
of the number of references retrieved by the application of specific
keywords related to MCS since 1990. The experiment was repeated
1
We can retrace decision combination long way back in history. Perhaps the first
worthy reference is the Greek democracy (meaning government of the people) ruling
that full citizens have an equal say in any decision that affects their life. Greeks Fig. 2. Evolution of the number of publications per year ranges retrieved from the
believed in the community wisdom, meaning that the rule of the majority will keywords specified in the plot legend. Each plot corresponds to searching site: the
produce the optimal joint decision. In 1785 Condorcet formulated the Jury Theorem top to Google Scholar; the center to the Web of Knowledge, the bottom to Scopus.
about the misclassification probability of a group of independent voters [5]], The first entry of the plots is for publications prior to 1990. The last entry is only for
providing the first result measuring the quality of classifier committee. the last 2 years.
M. Woźniak et al. / Information Fusion 16 (2014) 3–17 5
plots corresponds to the last 2 years, and some of the keywords give System structure. The general structure of MCS is depicted in
as many references as in the previous 5 years. Fig. 3 following a classical pattern recognition [38] application
Advantages. Dietterich [29] summarized the benefits of MCS: (a) structure. The most informative or discriminant features describ-
allowing to filter out hypothesis that, though accurate, might be ing the objects are input to the classifier ensemble, formed by a
incorrect due to a small training set, (b) combining classifiers set of complementary and diverse classifiers. An appropriate fusion
trained starting from different initial conditions could overcome method combines the individual classifier outputs optimally to
the local optima problem, and (c) the true function may be impos- provide the system decision. According to Ho [39], two main
sible to be modeled by any single hypothesis, but combinations of MCS design approaches can be distinguished. On one hand, the
hypotheses may expand the space of representable functions. so-called coverage optimization approach tries to cover the space
Rephrasing it, there is widespread acknowledgment of the follow- of possible models by the generation of a set of mutually comple-
ing advantages of MCS: mentary classifiers whose combination provides optimal accuracy.
On the other hand, the so-called decision optimization approach
MCS behave well in the two extreme cases of data availability: concentrates on designing and training an appropriate decision
when we have very scarce data samples for learning, and when combination function over a set of individual classifier given in ad-
we have a huge amount of them at our disposal. In the scarcity vance [40].The main issues in MCS design are:
case, MCS can exploit bootstrapping methods, such as bagging
or boosting. Intuitive reasoning justifies that the worst classifier System topology: How to interconnect individual classifiers.
would be out of the selection by this method [30], e.g., by indi- Ensemble design: How to drive the generation and selection of a
vidual classifier output averaging [31]. In the event of availabil- pool of valuable classifiers.
ity of a huge amount of learning data samples, MCS allow to Fuser design: How to build a decision combination function
train classifiers on dataset’s partitions and merge their decision (fuser) which can exploit the strengths of the selected classifiers
using appropriate combination rule [20]. and combine them optimally.
Combined classifier can outperform the best individual classi-
fier [32]. Under some conditions (e.g., majority voting by a 2. System topology
group of independent classifiers) this improvement has been
proven analytically [10]. Fig. 4 illustrates the two canonical topologies employed in MCS
Many machine learning algorithms are de facto heuristic search design. The overwhelming majority of MCS reported in the litera-
algorithms. For example the popular decision tree induction ture is structured in a parallel topology [22]. In this architecture,
method C4.5 [33] uses a greedy search approach, choosing the each classifier is feed the same input data, so that the final decision
search direction according to an heuristic attribute evaluation of the combined classifier output is made on the basis of the out-
function. Such an approach does not assure an optimal solution. puts of the individual classifiers obtained independently. Alterna-
Thus, the combined algorithm, which could start its work from tively, in the serial (or conditional) topology, individual classifiers
different initial points of the search space, is equivalent to a are applied in sequence, implying some kind of ranking or ordering
multi-start local random search which increases the probability over them. When the primary classifier cannot be trusted to clas-
of finding an optimal model. sify a given object, e.g., because of the low support/confidence in
MCS can easily be implemented in efficient computing environ- its result, then the data is feed to a secondary classifier [41,42],
ments such as parallel and multithreaded computer architec- and so on, adding classifiers in sequence. This topology is adequate
tures [34]. Another attractive area of implementation when the cost of classifier exploitation is important, so that the pri-
solutions is distributed computing systems (i.e.: P2P, Grid or mary classifier is the computationally cheapest one, and secondary
Cloud computing) [35,36], especially when a database is parti- classifiers have higher exploitation cost [43]. This model can be ap-
tioned for privacy reasons [37] so that partial solutions must plied to classifiers with the so-called reject option as well [44]. In
be computed on each partition and only the final decision is [45] the first classifier in the pipeline gives an estimation of the
available as the combination of the networked decision. certainty of the classification, so that uncertain data samples are
Wolpert stated that each classifier has its specific competence sent to a second classifier, specialized in difficult instances. We no-
domain [3], where they overcome other competing algorithms, tice the similarity of such approach to the ordered set of rules [46]
thus it is not possible to design a single classifier which outper- or decision list [47], when we consider each rule as the classifier.
forms another ones for each classification tasks. MCS try to A very special case of sequential topology is the Adaboost intro-
select always the local optimal model from the available pool duced by Freund and Schapire in 1995 [48], widely applied in data
of trained classifiers.
Fig. 4. The canonical topologies of MCSs: parallel (top) and serial (bottom).
mining problems [49]. The goal of boosting is to enhance the accu- proved that the error of a compound model based on a weighted
racy of any given learning algorithm, even weak learning algo- averaging of individual model outputs can be reduced according
rithms with an accuracy slightly better than chance. Shapire [50] to increasing diversity [56,59]. Brown et al. [60] showed a func-
showed that weak learners can be boosted into a strong learning tional relation between diversity and individual regressor accu-
algorithm by sequentially focusing on the subset of the training racy, allowing to control the bias-variance tradeoff systematically.
data that is hardest to classify. The algorithms performs training For classification problems such theoretical results have not
of the weak learner multiple times, each time presenting it with been proved yet, however many diversity measures have been pro-
an updated distribution over the training examples. The distribu- posed till now. On the one hand, it is intuitive that increasing
tion is altered so that hard parts of the feature space have higher diversity should lead to the better accuracy of the combined sys-
probability, i.e. trying to achieve a hard margin distribution. The tem, but there is no formal proof of this dependency [61], as con-
decisions generated by the weak learners are combined into a final firmed by the wide range of experimental results presented, e.g.,
single decision. The novelty of Adaboost lies in the adaptability of in [62]. In [53] authors decomposed the error of the classification
the successive distributions to the results of the previous weak by majority voting into individual accuracy, good and bad diversi-
learners, thus the name AdaptiveBoost. In the words of Kivinen ties. The good diversity has positive impact on ensemble error
et al. [51], AdaBoost finds a new distribution that is closest to the reduction, whereas the bad diversity has the opposite effect. Shark-
old one but taking into consideration the restriction that the new ley et al. [55] proposed a hierarchy of four levels of diversity
distribution must be orthogonal to the mistake vector of the cur- according to the answer of the majority rule, coincident failures,
rent weak learner. and possibility of at least one correct answer of ensemble mem-
bers. Brown et al. [58] argue that this hierarchy is not appropriate
3. Ensemble design when the ensemble diversity varies between feature subspaces.
They formulated the following taxonomy of diversity measures:
Viewing MCS as a case of robust software [52–55], diversity
arises as the guiding measure of the design process. Classifier Pairwise measures averaging a measure between each classifier
ensemble design aims to include mutually complementary individ- pair in an ensemble, such as Q-statistic [58], kappa-statistics
ual classifiers which are characterized by high diversity and accu- [63], disagreement [64] and double-fault measure [61,65].
racy [56]. The emphasis from the Hybrid Intelligent System point Non-pairwise diversity measures comparing outputs of a given
of view is in building MCS from components following different classifier and the entire ensemble, such as Kohavi–Wolpert var-
kinds of modeling and learning approaches, expecting an increase iance [66], a measure of inter-rater (inter-classifier) reliability
in diversity and a decrease in classifier output correlation [57]. [67], the entropy measure [68], the measure of difficulty [8],
Unfortunately, the problem of how to measure classifier diversity generalized diversity [52], and coincident failure diversity [69].
is still an open research topic. Brown et al. [58] notice that we
can ensure diversity using implicit or explicit approaches. Implicit The analysis of several diversity measures [70] relating them to
approaches include techniques of independent generation of indi- the concept of classifiers’ margin, showed their limitations and the
vidual classifiers, often based on random techniques, while explicit source of confusing empirical results. They relate the classifier selec-
approaches focus on the optimization of a diversity metric over a tion to a NP-complete matrix cover problem, implying that ensem-
given ensemble line-up. In this second kind of approaches, individ- ble design in fact a quite difficult combinatorial problem. Diversity
ual classifier training is performed conditional to the previous clas- measures usually employ the most valuable sub-ensemble in
sifiers with the aim of exploiting the strengths of valuable ensemble pruning processes [71]. To deal with the high computa-
members of classifier pool. This section discusses some diversity tional complexity of ensemble pruning, several hybrid approaches
measures, and the procedures followed to ensure diversity in the have been proposed such as heuristic techniques [72,73], evolution-
ensemble. ary algorithms [74,75], reinforcement learning [76], and competi-
tive cross-validation techniques [77]. For classification tasks, the
3.1. Diversity measures cost of acquiring feature values (which could be interpreted as the
price for examination or time required to collect the data for deci-
For regression problems, the variance of the outputs of ensem- sion making) can be critical. Some authors take it into consideration
ble members is a convenient diversity measure, because it was during the component classifier selection step [78,79].
M. Woźniak et al. / Information Fusion 16 (2014) 3–17 7
3.2. Ensuring diversity fier for each cluster according to its local accuracy. Adaptive
Splitting and Selection algorithm in [101] partitions the feature
According to [22,38] we can enforce the diversity of a classifier space and assigns classifiers to each partition into one inte-
pool by the manipulation of either individual classifier inputs, out- grated process. The main advantage of AdaSS is that the training
puts, or models. algorithm considers an area contour to determine the classifier
content and, conversely, that the region shapes adapt to the
3.2.1. Diversifying input data competencies of the classifiers. Additionally, the majority vot-
This diversification strategy assumes that classifiers trained on ing or more sophisticated rules are proposed as combination
different (disjoint) input subspaces become complementary. Three method of area classifiers [102]. Lee et al. [103] used the fuzzy
general strategies are identified: entropy measure to partition the feature space and select the
relevant features with good separability for each of them.
1. Using different data partitions. Dynamic classifier selection: the competencies of the individual
2. Using different sets of features. classifiers are calculated during classification operation [104–
3. Taking into consideration the local specialization of individual 107]. There are several interesting proposals which extend this
classifiers. concept, e.g., by using preselected committee of the individual
classifier and making the final decision on the basis of a voting
Data partitions They may be compelled by several reasons, such rule [108]. In [109,110] authors propose dynamic ensemble
as data privacy, or the need to learn over distributed data chunks selection based on the original competence measure using clas-
stored in different databases [80–82]. Regarding data privacy, we sification of so-called random reference classifier.
should notice that using distributed data may come up against le-
gal or commercial constraints which do not allow sharing raw Both static [111–113] and dynamic [114–116] classifier special-
datasets and merging them into a common repository [37]. To en- ization are widely used for data stream classification.
sure privacy we can train individual classifiers on each database
independently and merge their outputs using hybrid classifier 3.2.2. Diversifying outputs
principles [83]. The distributed data paradigm is strongly con- MCS diversity can be enforced by the manipulation of the indi-
nected with the big data analysis problem [84]. A huge database vidual classifier outputs, so that an individual classifier is designed
may impede to deliver trained classifiers under specified time con- to classify only some classes in the problem.
straints, imposing to resort to sampling techniques to obtain man- The combination method should restore the whole class label
ageable dataset partitions. A well known approach is cross- set, e.g., a multi-class classification problem can be decomposed
validated committee which requires to minimize overlapping of into a set of binary classification problems [117,118]. The most
dataset partitions [56]. Providing individualized train datasets for popular propositions of two-class classifier combinations are:
each classifier is convenient in the case of shortage of learning OAO (one-against-one) and OAA (one-against-all)[119], where at
examples. Most popular techniques, such as bagging [12] or boost- least one predictor relates to each class. The model that a given ob-
ing [14,19,64,85], have their origin in bootstrapping [13]. These ject belongs to a chosen class is tested against the alternative of the
methods try to ascertain if a set of weak classifier may produce a feature vector belonging to any other class. In the OAA method, a
strong one. Bagging applies sampling with replacement to obtain classifier is trained to separate a chosen class from the remaining
independent training datasets for each individual classifier. Boost- ones. OAA returns class with maximum support. In more general
ing modifies the input data distribution perceived by each classifier approaches, the combination of individual outputs is made by find-
from the results of classifiers trained before, focusing on difficult ing the closest class, in some sense, to the code given by the out-
samples, making the final decision by a weighted voting rule. puts of the individual classifiers. ECOC (Error Correcting Output
Data features May be selected to ensure diversity training of a Codes) model was proposed by Dieterich and Bakiri [118], who as-
pool of classifiers. The Random Subspace [86,87] was employed sumed that a set of classifiers produces sequence of bits which is
for several types of the individual classifiers such as decision tree related to code-words during training. The ECOC points at the class
(Random Forest) [88], linear classifiers [89], or minimal distance with the smallest Hamming distance to its codeword. Passerini
classifier [90,91]. It is worth pointing out the interesting proposi- et al. showed advantages of this method over traditional ones for
tions dedicated one-class classifier presented by Nanni [92] or an the ensemble of support vector machines [120].
hierarchical method of ensemble forming, based on feature space Recently several interesting propositions on how to combine
splitting and then assigning two-class classifiers (i.e. Support Vec- the binary classifiers were proposed. Wu et al. [121] used pairwise
tor Machines) locally presented in [93,94]. Attribute Bagging [95] is coupling, Friedman employed Max-Win rule [122], Hüllermeier
a wrapper method that establishes the appropriate size of a feature proposed the adaptive weighted voting procedure [123]. A com-
subset, and then creates random projections of a given training set prehensive recent survey of binary classifier ensembles is [124].
by random selection of feature subsets. The classifier ensemble are It worth mentioning the one-class classification model which is
train on the basis of the obtained set. the special case of binary classifier trained in the absence of coun-
Local specialization It is assumed for classifier selection, select- terexamples. Its main goal is to model normality in order to detect
ing the best single classifier from a pool of classifiers trained over anomaly or outliers from the target class [125]. To combine such
each partition of the feature space. It gives the MCS answer for all classifiers the typical methods developed for binary ones are used
objects included in the partition [7]. Some proposals assume clas- [126] but it is worth mention the work by Wilk and Wozniak
sifier local specialization, providing only locally optimal solutions where authors restored multi-class classification task using a pool
[38,96–98,72], while others divide the feature space, selecting (or of one-class classifiers and the fuzzy inference system [127]. The
training) a classifier for each partition. Static and dynamic ap- combination methods dedicated the one-class classifiers still await
proaches are distinguished: a proper attention [128].
Static classifier selection [99]: the relation between region of 3.2.3. Diversifying models
competence and assigned classifier is fixed. Kuncheva’s Cluster- Ensembles with individual classifiers based on different classifi-
ing and Selection algorithm [100] partitions the feature space cation models take advantage of the different biases of each classi-
by a clustering algorithm, and selects the best individual classi- fier model [3]. However, the combination rule should be carefully
8 M. Woźniak et al. / Information Fusion 16 (2014) 3–17
chosen. We can combine the class labels but in the case of contin- unanimous voting, so that the answer requires that all classifi-
uous outputs we have to normalize them, e.g., using fuzzy ap- ers agree,
proach [127]. We could use the different versions of the same simple majority, so that the answer is given if majority is
model as well, because many machine learning algorithms do not greater than half the pool of classifiers,
guarantee to find the optimal classifier. Combining the results of majority voting, taking the answer with the highest number of
various initializations may give good results. Alternatively, a pool votes.
of classifiers can be produced by noise injection. Regarding neural
networks [129] it is easy to train pools of networks where each of The expected error of majority voting (for independent classifi-
them is trained starting from randomly chosen initial weights. ers with the same quality) was estimated in 1794 according to Ber-
Regarding decisions tree we can choose randomly the test for a gi- noulli’s equation, proven as the Condorcet Jury Theorem [5]. Later
ven node among the possible tests according to the value of a split- works focused on the analytically derived classification perfor-
ting criterion. mance of combined classifiers hold only when strong conditions
are met [8] so that they are not useful from practical point of view.
Alternative voting methods weight differently the decisions com-
4. Fuser design
ing from different committee members [22,133]. The typical archi-
tecture of combined classifier based on class labels is presented in
Some works consider the answers from a given Oracle as the
the left diagram of Fig. 6. In [134] authors distinguished the types
reference combination model [130]. The Oracle is an abstract com-
of weighted voting according to the classifier, both to the classifier
bination model, built such that if at least one of the individual clas-
and the class, and, finally, to features values, the classifier and the
sifiers provides the correct answer, then the MCS committee
class. Anyway, no one of these models can improve over the Oracle.
outputs the correct class too. Some researches used the Oracle in
To achieve that we need additional information, such as the feature
comparative experiments to provide a performance upper bound
values [132,135,136] as depicted in the right diagram of Fig. 6.
for classifier committee [10] or information fusion methods
[131]. A simple example shows the risks of the Oracle model: as-
4.2. Support function fusion
sume we have two classifiers for a binary class problem, a random
one and the other that always returns the opposite decision; hence
Support function fusion system architecture is depicted in
the Oracle will always return the correct answer. As a consequence
Fig. 7. Support functions provide a score for the decision taken
the Oracle model does not fit in the Bayesian paradigm. Raudys
by an individual classifier. The value of a support function is the
[132] noticed that Oracle is a kind of quality measure of a given
estimated likelihood of a class, computed either as a neural net-
individual classifier pool. Let us systematize methods of classifier
work output, a posteriori probability, or fuzzy membership func-
fusion, which on the one hand could use class labels or support
tion. First to be mentioned, the Borda count [11] computes an
function, on the other hand combination rules could be given or
score for each class on the basis of its ranking by each individual
be the results of training. The taxonomy of decision fusion strate-
classifier. The most popular form of support function is the a pos-
gies is depicted in Fig. 5.
teriori probability [26], produced by the probabilistic models
embodied by the classifiers [137–139]. There are many works fol-
4.1. Class label fusion lowing this approach, such as the optimal projective fuser of [140],
the combination of neural networks outputs according to their
Early algorithms performing fusion of classifier responses accuracy [141], and Naïve Bayes as the MCS combination method
[9,10,61] only implemented majority voting schemes in three main [142].
versions [22]: Some analytical properties and experimental evaluations of
aggregating methods were presented in [10,31,143,144]. The
aggregating methods use simple operators such as supremum or
the mean value. They do not involve learning. However, they have
little practical applicability because of the hard conditions imposed
by them [145]. The main aggregating advantage is that it counter-
acts over-fitting of individual classifiers. According to [134], the
following types of weighted aggregation can be identified depend-
ing on: (a) only the classifier id, (b) the classifier and the feature
vector, (c) on the classifier and the class, and (d) on the classifier,
the class, and the feature vector. For two-class recognition prob-
lems only the last two types of aggregation allow to produce com-
pound classifier which may improve the Oracle. For many-class
problems, it is possible to improve the Oracle [131] using any of
these aggregation methods. Finally, another salient approach is
the mixture of experts [146,147] which combines classifier outputs
using so-called input dependent gating function. Tresp and Tanig-
uchi [148] proposed a linear function for this fuser model, and
Cheeseman [149] proposed a mixture of Gaussian.
Fig. 6. Architecture of the MCS making decision on the basis of class label fusion only (left diagram). The right diagram corresponds to a MCS using additional information
from the feature values.
Fig. 7. Architecture of the MCS which computes the decision on the basis of support function combination.
analysis [154]. Other fuser trainable methods may be strictly re- which are used to train new models. The individual classifiers eval-
lated to ensemble pruning methods, when authors use some heu- uation is done on their accuracy on the new data. The best per-
ristic search algorithm to select the classifier ensemble, as [72,141] forming classifiers are selected to constitute the MCS committee
according to the chosen fuser. in the next time epoch. As the decision rule, the SEA uses a major-
We have to mention the group of combination methods built ity voting, whereas the AWE uses a weighted voting strategy. Ko-
from pools of heterogenous classifiers, i.e. using different classifica- tler et al. present the Dynamic Weighted Majority (DWM)
tion models, such as stacking [155]. This method trains combina- algorithm [114] which modifies the decision combination weights
tion block using individual classifier outputs presented during and updates the ensemble according to number of incorrect deci-
classification of the whole training set. Most of the combination sions made by individual classifiers. When a classifier weight is
methods do not take into consideration possible relations among too small, then it is removed from the ensemble, a new classifier
individual classifiers. Huang and Suen [156] proposed Behavior- is trained and added to the ensemble in its place.
Knowledge Space method which aggregates the individual classifi- A difficult problem is drift detection, which is the problem of
ers decision on the basis of the statistical approach. deciding that the Concept Drift has taken place. The current re-
search direction is to propose an additional binary classifier giv-
ing the decision to rebuild the classifiers. The drift detector can
5. Concept Drift be based on changes in the probability distribution of the in-
stances [161–163] or classification accuracy [164,165]. Not all
Before entering the discussion of practical applications we con- classification algorithms dealing with concept drift require drift
sider a very specific topic of real life relevance which is known as detection, because they can adjust the model to incoming data
Concept Drift in knowledge engineering domains, or non-station- [166][?].
ary processes in signal processing and statistics domains. Most of
the conventional classifiers do not take into consideration this phe-
nomenon. Concept Drift means that the statistical dependencies
between object features and its classification may change in time, 6. Applications
so that future data may be badly processed if we maintain the
same classification, because the object category or its properties Reported applications of classifier ensembles have grown
will be changing. Concept drift occurs frequently in real life astoundingly in the recent years due to the increase in computa-
[157]. MCS are specially well suited to deal with Concept Drift. tional power allowing training of large collections of classifiers in
Machine learning methods in security applications (like spam practical application time constraints. A recent review appears in
filters or IDS/IPS) [158] or decision support systems for marketing [18]. Sometimes the works combine diverse kinds of classifiers,
departments [159] require to take into account new training data so-called heterogeneous MCS. Homogeneous MCS, such as Random
with potentially different statistical properties [116]. The occur- Forest (RF), are composed of classifiers of the same kind. In the
rence of Concept Drift decreases the true classification accuracy works revised below, basic classifiers are Multi-Layer Perceptron
dramatically. The most popular approaches are the Streaming (MLP), k-Nearest Neighbor (kNN), Radial Basis Function (RBF), Sup-
Ensemble Algorithm (SEA) [111] and the Accuracy Weighted port Vector Machines (SVM), Probabilistic Neural Networks
Ensemble (AWE) [160]. Incoming data are collected in data chunks, (PNNs), and Maximum Likelihood (ML) classifiers.
10 M. Woźniak et al. / Information Fusion 16 (2014) 3–17
We review in this section recent applications to remote sensing tion of RF to SAR multitemporal data aims to achieve season invari-
data, computer security, financial risk assessment, fraud detection, ant detection of several classes of land cover, i.e. grassland, ceral,
recommender systems, and medical computer aided diagnosis. forest, etc. [173]. RF performed best, with lowest spatial variability.
Images were coregistered and some model portability was tested,
6.1. Remote sensing where the model trained on one SAR image was applied on other
SAR images of the same site obtained at different times. The suc-
The main problems addressed by MCS in remote sensing do- cess of RF for remote sensing images has prompted the proposal
mains are the land cover mapping and change detection. Land cov- of an specific computational environment [174].
er mapping consists in the identification of materials that are in the Ensambles of SVM have been also applied to land cover map. In-
surface of the area being covered. Depending on the application, a deed, the ground truth data scarcity has been attacked by an active
few general classes may be identified, i.e. vegetation, water, build- learning approach to semi-supervised SVM training [175]. The ac-
ings, roads, or a more precise classification can be required, i.e. tive learning approach is based on the clustering of the unlabeled
identifying tree or crop types. Applications include agriculture, for- data samples according to the clustering of the SVM outputs on
estry, geology, urban planning, infrastructure degradation assess- the current training dataset. Samples with higher membership
ment. Change detection consists in the identification of places coefficient are added to the corresponding class data, and the clas-
where the land cover has changed in time, it implies the computa- sifier is retrained in an iterative process. These semi-supervised
tion over time series of images. Change detection may or may not SVM are combined in a majority voting ensemble and applied to
be based on previous or separate land cover maps. Remote sensing the classification SPOT and Landsat optical data. Land cover classi-
classification can be done on a variety of data sources, sometimes fication in the specific context of shallow waters has the additional
performing fusion of different data modalities. Optical data has difficulties of the scattering, refraction and reflection effects intro-
better interpretability by humans, but land is easily occluded by duced by the water cover. A robust process combines a parallel and
weather conditions, i.e. cloud formations. Hyperspectral sensing a serial architecture [176], where initial classification results ob-
provides high-dimensional data at each image pixel, with high tained by SVM are refined in a second SVM classifier and the final
spectral resolution. Synthetic Aperture Radar (SAR) is not affected result is given by a linear combination of two ensembles of SVM
by weather or other atmospheric conditions, so that observations classifiers and a minimum distance classifier. Besides, the system
are better suited for continuous monitoring of seasonally changing estimates the water depth by a bathymetry estimation process.
land covers. SAR can provide also multivariate data from varying The approach is applied to Landsat images for the estimation of
radar frequencies. Other data sources are elevation maps, and coral population in coastal waters. Polarimetric SAR data used for
other ancillary information, such as the measurements of environ- the classification of Boreal forests require an ensemble of SVM
mental sensors. [177]. Each of the SVM is specifically tuned to a class, with specific
feature selection process. Best results are obtained when multi-
6.1.1. Land cover mapping temporal data is used, joining two images from two different sea-
Early application of MCS to land cover mapping consisted in sons (summer and winter) and performing the feature selection
overproducing a large set of classifiers and searching for the opti- and training on the joint data vectors.
mal subset [38,65,167]. To avoid the combinatorial complexity,
the approach performs clustering of classifier error, aggregating 6.1.2. Change detection
similar classifiers. The approach was proven to be optimal under Early application of MCS to land cover change detection was
some conditions on the classifiers. Interestingly, testing was per- based on non-parametric algorithms, specifically MLP, k-NN, RBF,
formed on multi-source data, composing the pixel’s feature vector and ML classifiers [178,179], where classifier fusion was performed
of joining multi-spectral with radar data channels, to compute the either by majority voting, Bayesian average and maximum a poste-
land cover map. The MCS was heterogenous, composed of MLP, riori probability. Testing data were Thematic Mapper multispectral
RBF, and PNN. images, and the Synthetic Aperture Radar (SAR) of Landsat 5 satel-
The application of RF to processing remote sensing data has lite. Recent works on change detection in panchromatic images
been abundant in the literature. It has been applied to estimate with MCS follow three different decision fuser strategies: majority
land cover on Landsat data over Granada, Spain [168] and multi- voting, Dempster-Shafer evidence theory, and the Fuzzy Integral
source data in a Colorado mountainous area [169]. Specifically, [180]. The sequential process of the images previous to classifica-
Landsat Multi-Spectral, elevation, slope and aspect data are used tion includes pan-sharpening of the multi-temporal images, co-
as input features. The RF approach is able to successfully fuse these registration, raw radiometric change detection by image subtrac-
inhomogeneous informations. Works on hyperspectral images ac- tion and automatic thresholding, and a final MCS decision com-
quired by the HyMap sensor have been addressed to build vegeta- puted on the multi-spectral data and the change detection data
tion thematic maps [170], comparing RF and decision tree-based obtained from the various pan-sharpening approaches.
Adaboost, as well as two feature selection methods: the out-of-
bag and a best-first search wrapper feature subset selection meth- 6.2. Computer security
od. Diverse feature subsets are tested, and the general conclusion is
that tree ecotopes are better discriminated than grass ecotopes. Computer security is at the core of most critical services nowa-
Further work with RF has been done assessing the uncertainty in days, from universities, banking, companies, communication. Se-
modeling the distribution of vegetation types [171], performing cure information processing is a growing concern, and the
classification on the basis of environmental variables, in an ap- machine learning approaches are trying to provide predictive solu-
proach that combines spatial distribution modeling by spatial tions that may allow to avoid the negative impact of such attacks.
interpolation, using sequential Gaussian simulation and the clus- Here we introduce some of the problems, with current solutions
tering of species into vegetation types. Dealing with labeled data proposed from the MCS paradigm.
scarcity, there are methods [172] based on the combination of RF
and the enrichment of the training dataset with artificially gener- 6.2.1. Distributed denial of service
ated samples in order to increase classifier diversity, which is ap- Distributed denial of service (DDoS) are among the most threat-
plied to Landsat multispectral data. Artificial data is generated ening attacks that an Internet Service Provider may face. Distrib-
from the Gaussian modeling of the data distribution. The applica- uted service providers, such as military applications, e-healthcare
M. Woźniak et al. / Information Fusion 16 (2014) 3–17 11
and e-governance can be very sensitive to this type of attacks, 6.2.4. Wireless sensor networks
which can produce network performance degradation, service Wireless sensor networks (WSNs) are collections of inexpen-
unavailability, and revenue loss. There is a need for intelligent sys- sive, low power devices deployed over a geographical space for
tems able to discriminate legitimate flash crowds from an attack. A monitoring, measuring and event detection. Anomalies in the
general architecture for automatic detection of DDoS attacks is WSN can be due to failures in software or hardware, or to mali-
needed where the attack detection may be performed by a MCS. cious attacks compelling the sensors to bias or drop their informa-
The MCS constituent classifiers may be ANNs trained with robust tion and measurements. Anomaly detection in WSN is performed
learning algorithms, i.e. Resilient Back Propagation (RBP). Specifi- using an ensemble of binary classifiers, each tuned on diverse
cally, a boosting strategy is defined on the ensemble of RBP trained parameters and built following a different approach (Average,
ANNs, and a Neyman Pearson approach is used to make the final autorregresive, neural network, ANFIS). The decision is made by a
decision [181]. This architecture may be based on Sugeno Adaptive weighted combination of the classifiers outputs [187].
Neuro-Fuzzy Inference Systems (ANFIS) [182]. A critical issue of
the approach is the need to report validation results, which can 6.3. Banking, credit risk, fraud detection
only be based on recorded real life DDoS attacks. There are some
public available datasets to perform and report these results. How- In the current economical situation, the intelligent processing of
ever, results reported on these datasets may not be informative of financial information, the assessing of financial or credit risks, and
the system performance on new attacks which may have quite dif- related issues have become a prime concern for society and for the
ferent features. This is a pervasive concern in all security applica- computational intelligence community. Developing new tools may
tions of machine learning algorithms. allow to avoid in the future the dire problems faced today by soci-
ety. In this section we review some of the most important issues,
gathering current attempts to the deal with them.
6.2.2. Malware
Malicious code, such as trojans, virus, spyware, detection by
6.3.1. Fraud detection
anti-virus approaches can only be performed after some instance
Fraud detection involves identifying fraud as soon as possible
of the code has been analyzed finding some kind of signature,
after it has been perpetrated. Fraud detection [188] is big area of
therefore some degree of damage has already been done. Predic-
research and applications of machine learning, which has provided
tive approaches based on Machine Learning techniques may al-
techniques to counteract fraudsters in credit card fraud, money
low anticipative detection at the cost of some false positives.
laundering, telecommunications fraud, and computer intrusion.
Classifiers learn patterns in the known malicious codes extrapo-
MCS have been also applied successfully in this domain. A key task
lating to yet unseen codes. A taxonomy of such approaches is gi-
is modeling the normal behavior in order to be able to establish
ven in [183]. describing the basic code representation by byte
suspicion scores for outliers. Probabilistic networks are specific
and opcode n-grams, strings, and others like portable executable
one-class classifiers that are well suited to this task, and bagging
features. Feature selection processes, such as the Fisher score,
of probabilistic networks has been proposed as a general tool for
are applied to find the most informative features. Finally, classi-
fraud detection because the MCS approach improves the robust-
fiers tested in this problem include a wide variety of MCS com-
ness of the normal behavior modeling [189].
bining diverse base classifiers with all standard fuser designs.
Results have been reported that MCS overcome other ap-
6.3.2. Credit card fraud
proaches, are better suitable for active learning needed to keep
Specific works on credit card fraud detection use real-life data
the classifiers updated and tuned to the changing malicious code
of transactions from an international creditcard operation [190].
versions.
The exploration of the sensitivity to the ratio of fraud to non-fraud
of the random undersampling approach to deal with unbalanced
6.2.3. Intrusion detection class sizes is required to validate the approaches. Comparing RF
Intrusion Detection and Intrusion Prevention deal with the against SVM and logisti regression [190], RF was the best per-
identification of intruder code in a networked environment via former in all experimental conditions as measured by almost all
the monitoring of communication patterns. Intruder detection per- performance measurements. Other approaches to this problem in-
formed as an anomaly detection process allows to detect previ- clude a bagged ensemble of SVM tested on a british card applica-
ously unseen patterns, at the cost of false alarms, contrary to tion approval dataset [191].
signature based approaches. The problem is attacked by modular
MCS whose compounding base classifiers are one-class classifiers 6.3.3. Stock market
built by the Parzen window probability density estimation ap- Trade based stock market manipulation try to influence the
proach [128]. Each module is specialized in a specific protocol or stock values simply by buying and then selling. It is difficult to de-
network service, so that different thresholds can be tuned for each tect because rules for detection quickly become outdated. An inno-
module allowing some optimization of the false alarm rate. On the vative research track is the use of peer-group analysis for trade
other hand, Intrusion Prevention tries to impede the execution of stock manipulation detection, based on the detection of outliers
the intruder code by fail-safe semantics, automatic response and whose dynamic behavior separates from that of the previously
adaptive enforcement. An approach relies on the fact that Instruc- similar stock values, its peers [192]. Dynamic clustering allows to
tion Set Randomization prevents code injection attacks, so that de- track in time the evolution of the community of peers related to
tected injected code can be used for adaptation of the anomaly the stocks under observation, and outlier detection techniques
classifier and the signature-based filtering [184]. Clustering of n- are required to detect the manipulation events.
grams is performed to obtain a model of the normal communica-
tion behavior which is accurate allowing zero-day detection of 6.3.4. Credit risk
worm infection even in the case of low payload or slow penetration Credit risk prediction models seek to predict whether an indi-
[185]. The interesting proposed hybrid intrusion detection was vidual will default on a loan or not. It is greatly affected by the
presented in [186], where decision trees and support vector ma- unavailability, scarcity and incompleteness of data. The application
chines are combined as a hierarchical hybrid intelligent system of machine learning to this problem includes the evaluation of bag-
model. ging, boosting, stacking as well as other conventional classifiers
12 M. Woźniak et al. / Information Fusion 16 (2014) 3–17
the aging of populations around the world. Diverse MCS ap- brid classifier systems. We envisage further possibilities of hybrid-
proaches have been applied to structural MRI data, specifically ization such as:
for the classification of Alzheimer disease patients, such as an
RVM based two stage pipeline [45], variations of Adaboost [215], Merging the raw data from different sources into one repository
hybridizations of kernel and Dendritic Computing approaches and then train the classifier.
[216]. Classifier Ensembles have been applied to the classification Merging the raw data and a prior expert knowledge (e.g., learn-
of fMRI data [217,218] and its visual decoding [219], which is the ing sets and human expert rules to improve rules on the basis of
reconstruction of the visual stimuli from the fMRI data. incoming data).
Merging a prior expert knowledge and classification models
returned by machine learning procedures.
6.5. Recommender systems
For such a problem we have to take into consideration issues re-
Nowadays, recommender systems are the focus of intense re- lated to data privacy, computational and memory efficiency.
search [220]. They try to help consumers to select the product that
may be interesting for them based on their previous searches and
transactions, but such systems are expanding beyond typical sales. Acknowledgments
They are used to predict which mobile telephone subscribers are in
risk of switching to another provider, or to advice conference orga- We would like to thank the anonymous reviewers for their dil-
nizers about assigning papers to peer reviewers [221]. Burke [222] igent work and efficient efforts. We are also grateful to the Editor-
proposed hybrid recommender systems combining two or more in-Chief, Prof. Belur V. Dasarathy, who encouraged us to write this
recommendation techniques to improve performance avoiding survey for this prestigious journal.
the drawbacks of an individual recommender. Similar observations MichałWoźniak was supported by The Polish National Science
were confirmed by Balabanovic et al. [223] and Pazzani [224] who Centre under the Grant No. N519 576638 which is being realized
demonstrated that hybrid method recommentations improve col- in years 2010–2013.
laborative and content-based approaches.
There are several interesting works which apply the hybrid and References
combined approach to recommender systems. Jahrer and Töscher
[225] demonstrated the advantage of ensemble learning applied [1] J. Neumann, The Computer and the Brain, Yale University Press, New Haven,
to the combination of different collaborative filtering algorithms CT, USA, 1958.
[2] A. Newell, Intellectual issues in the history of artificial intelligence, in: F.
on the Netix Prize dataset. Porcel et al. [226] developed an hybrid Machlup, U. Mansfield (Eds.), The Study of Information: Interdisciplinary
fuzzy recommender system to help disseminate information about Messages, John Wiley & Sons Inc., New York, NY, USA, 1983, pp. 187–294.
research resources in the field of interest of a user. Claypool et al. [3] D. Wolpert, The supervised learning no-free-lunch theorems, in: Proceedings
of the 6th Online World Conference on Soft Computing in Industrial
[227] performed a linear combination of the ratings obtained from Applications, 2001, pp. 25–42.
individual recommender systems into one final recommendation, [4] C.K. Chow, Statistical independence and threshold functions, IEEE
while Pazzani proposed to use a voting scheme [224]. Billsus and Transactions on Electronic Computers EC-14 (1) (1965) 66–68.
[5] L. Shapley, B. Grofman, Optimizing group judgmental accuracy in the
Pazzani [228] selected the best recommendation on the basis of a presence of interdependencies, Public Choice 43 (3) (1984) 329–333.
recommendation quality metric as the level of confidence while [6] B.V. Dasarathy, B.V. Sheela, A composite classifier system design: concepts
Tran and Cohen [229] preferred an individual which is the most and methodology, Proceedings of the IEEE 67 (5) (1979) 708–713.
[7] L. Rastrigin, R.H. Erenstein, Method of Collective Recognition, Energoizdat,
consistent with the previous ratings of the user. Kunaver et al.
Moscow, 1981.
[230] proposed Combined Collaborative Recommender based on [8] L. Hansen, P. Salamon, Neural network ensembles, IEEE Transactions on
three different collaborative recommender techniques. Goksedef Pattern Analysis and Machine Intelligence 12 (10) (1990) 993–1001, http://
and Gundoz-Oguducu [231] combined the results of several rec- dx.doi.org/10.1109/34.58871.
[9] L. Xu, A. Krzyzak, C. Suen, Methods of combining multiple classifiers and their
ommender techniques based on Web usage mining. applications to handwriting recognition, IEEE Transactions on Systems, Man
and Cybernetics 22 (3) (1992) 418–435.
[10] K. Tumer, J. Ghosh, Analysis of decision boundaries in linearly combined
neural classifiers, Pattern Recognition 29 (2) (1996) 341–348.
7. Final remarks [11] T. Ho, J.J. Hull, S. Srihari, Decision combination in multiple classifier systems,
IEEE Transactions on Pattern Analysis and Machine Intelligence 16 (1) (1994)
We have summarized the main research streams on multiple 66–75.
[12] L. Breiman, Bagging predictors, Machine Learning 24 (2) (1996) 123–140.
classifier systems, also known in the literature as combined classi- [13] R. Schapire, The strength of weak learnability, Machine Learning 5 (2) (1990)
fier or classifier ensemble. Such hybrid systems are the focus of in- 197–227.
tense research recently, so fruitful that our review could not be [14] Y. Freund, Boosting a weak learning algorithm by majority, Information
Computing 121 (2) (1995) 256–285.
exhaustive. Key issues related to the problem under consideration [15] M. Kearns, U. Vazirani, An Introduction to Computational Learning Theory,
are classifier diversity and methods of classifier combination. MIT Press, Cambridge, MA, USA, 1994.
The diversity is believed to provide improved accuracy and clas- [16] D. Angluin, Queries and concept learning, Machine Learning 2 (4) (1988) 319–
342.
sifier performance. Most works try to obtain maximum diversity [17] A. Jain, R. Duin, M. Jianchang, Statistical pattern recognition: a review, IEEE
by different means: introducing classifier heterogeneity, boot- Transactions on Pattern Analysis and Machine Intelligence 22 (1) (2000) 4–
strapping the training data, randomizing feature selection, ran- 37.
[18] N. Oza, K. Tumer, Classifier ensembles: select real-world applications,
domizing subspace projections, boosting the data weights, and Information Fusion 9 (1) (2008) 4–20.
many combinations of these ideas. Nowadays, the diversity [19] R. Polikar, Ensemble based systems in decision making, IEEE Circuits and
hypothesis has not been fully proven, either theoretically or empir- Systems Magazine 6 (3) (2006) 21–45.
[20] R. Polikar, Ensemble learning, Scholarpedia 3 (12) (2008) 2776.
ically. However, the fact is that MCSs show in most instances im-
[21] L. Rokach, Taxonomy for characterizing ensemble methods in classification
proved performance, resilience and robustness to high data tasks: a review and annotated bibliography, Computational Statistics and
dimensionality and diverse forms of noise, such as labeling noise. Data Analysis 53 (12) (2009) 4046–4072.
The there are several propositions how to combine the classifier [22] L. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms, Wiley-
Interscience, 2004.
outputs, what was presented in this work, nonetheless we point [23] L. Rokach, Pattern Classification Using Ensemble Methods, Series in Machine
out that classifier combination is not the only way to produce hy- Perception and Artificial Intelligence, World Scientific, 2010.
14 M. Woźniak et al. / Information Fusion 16 (2014) 3–17
[24] G. Seni, J. Elder, Ensemble Methods in Data Mining: Improving Accuracy [57] G. Zenobi, P. Cunningham, Using diversity in preparing ensembles of
Through Combining Predictions, Morgan and Claypool Publishers, 2010. classifiers based on different feature subsets to minimize generalization
[25] B. Baruque, E. Corchado, Fusion Methods for Unsupervised Learning error, Machine Learning: ECML 2001 (2001) 576–587.
Ensembles, Springer Verlag New York, Inc., 2011. [58] G. Brown, J. Wyatt, R. Harris, X. Yao, Diversity creation methods: a survey and
[26] R. Duda, P. Hart, D. Stork, Pattern Classification, second ed., Wiley, New York, categorisation, Information Fusion 6 (1) (2005) 5–20.
2001. [59] N. Ueda, R. Nakano, Generalization error of ensemble estimators, in:
[27] E. Alpaydin, Introduction to Machine Learning, second ed., The MIT Press, Proceedings of IEEE International Conference on Neural Networks,
2010. Washington, USA, 1996, pp. 90–95.
[28] C. Bishop, Pattern Recognition and Machine Learning (Information Science [60] G. Brown, J. Wyatt, P. Tiňo, Managing diversity in regression ensembles,
and Statistics), Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006. Journal of Machine Learning Research 6 (2005) 1621–1650.
[29] T. Dietterich, Ensemble methods in machine learning, in: Multiple Classifier [61] L. Kuncheva, C. Whitaker, C. Shipp, R. Duin, Limits on the majority vote
Systems, Lecture Notes in Computer Science, vol. 1857, Springer, Berlin, accuracy in classifier fusion, Pattern Analysis and Applications 6 (2003) 22–
Heidelberg, 2000, pp. 1–15. 31.
[30] G. Marcialis, F. Roli, Fusion of face recognition algorithms for video-based [62] Y. Bi, The impact of diversity on the accuracy of evidential classifier
surveillance systems, in: G.L. Foresti, C. Regazzoni, P. Varshney (Eds.), 2003, ensembles, International Journal of Approximate Reasoning 53 (4) (2012)
pp. 235–250. 584–607.
[31] S. Hashem, Optimal linear combinations of neural networks, Neural Networks [63] D. Margineantu, T. Dietterich, Pruning adaptive boosting, in: Proceedings of
10 (4) (1997) 599–614. the Fourteenth International Conference on Machine Learning, ICML ’97,
[32] R. Clemen, Combining forecasts: a review and annotated bibliography, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1997, pp. 211–218.
International Journal of Forecasting 5 (4) (1989) 559–583. [64] D. Skalak, The sources of increased accuracy for two proposed boosting
[33] J. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Series in algorithms, in: Proceedings of the American Association for Arti Intelligence,
Machine Learning, Morgan Kaufman Publishers, 1993. AAAI-96, Integrating Multiple Learned Models Workshop, 1996, pp. 120–125.
[34] T. Wilk, M. Wozniak, Complexity and multithreaded implementation analysis [65] G. Giacinto, F. Roli, Design of effective neural network ensembles for image
of one class-classifiers fuzzy combiner, in: E. Corchado, M. Kurzynski, M. classification purposes, Image Vision Computing 19 (9-10) (2001) 699–707.
Wozniak (Eds.), Hybrid Artificial Intelligent Systems, Lecture Notes in [66] R. Kohavi, D. Wolpert, Bias plus variance decomposition for zero-one loss
Computer Science, vol. 6679, Springer, Berlin/Heidelberg, 2011, pp. 237–244. functions, in: ICML-96, 1996.
[35] T. Kacprzak, K. Walkowiak, M. Wozniak, Optimization of overlay distributed [67] J. Fleiss, J. Cuzick, The reliability of dichotomous judgments: unequal
computing systems for multiple classifier system – heuristic approach, Logic numbers of judgments per subject, Applied Psychological Measurement 4
Journal of IGPL, doi:10.1093/jigpal/jzr020. (3) (1979) 537–542.
[36] K. Walkowiak, Anycasting in connection-oriented computer networks: [68] P. Cunningham, J. Carney, Diversity versus quality in classification ensembles
models, algorithms and results, International Journal of Applied based on feature selection, in: Proceedings of the 11th European Conference
Mathematics and Computer Sciences 20 (1) (2010) 207–220. on Machine Learning, ECML ’00, Springer-Verlag, London, UK, 2000, pp. 109–
[37] R. Agrawal, R. Srikant, Privacy-preserving data mining, SIGMOD Records 29 116.
(2) (2000) 439–450. [69] C. Shipp, L. Kuncheva, Relationships between combination methods and
[38] G. Giacinto, F. Roli, G. Fumera, Design of effective multiple classifier systems measures of diversity in combining classifiers, Information Fusion 3 (2)
by clustering of classifiers, in: Proceedings of the 15th International (2002) 135–148.
Conference on Pattern Recognition, 2000, vol. 2, 2000, pp. 160–163. [70] E.K. Tang, P.N. Suganthan, X. Yao, An analysis of diversity measures, Machine
[39] T. Ho, Complexity of classification problems and comparative advantages of Learning 65 (1) (2006) 247–271.
combined classifiers, in: Proceedings of the First International Workshop on [71] G. Martinez-Mu/ noz, D. Hern/’andez-Lobato, A. Suarez, An analysis of
Multiple Classifier Systems, MCS ’00, Springer-Verlag, London, UK, 2000, pp. ensemble pruning techniques based on ordered aggregation, IEEE
97–106. Transactions on Pattern Analysis and Machine Intelligence 31 (2) (2009)
[40] F. Roli, G. Giacinto, Design of Multiple Classifier Systems, World Scientific 245–259.
Publishing, 2002. [72] D. Ruta, B. Gabrys, Classifier selection for majority voting, Information Fusion
[41] L. Lam, Classifier combinations: implementations and theoretical issues, in: 6 (1) (2005) 63–81.
Proceedings of the First International Workshop on Multiple Classifier [73] R. Banfield, L. Hall, K. Bowyer, W. Kegelmeyer, Ensemble diversity measures
Systems, MCS ’00, Springer-Verlag, London, UK, 2000, pp. 77–86. and their application to thinning, Information Fusion 6 (1) (2005) 49–62.
[42] A.F.R. Rahman, M.C. Fairhurst, Serial combination of multiple experts: a [74] Z.-H. Zhou, J. Wu, W. Tang, Ensembling neural networks: many could be
unified evaluation, Pattern Analysis and Applications 2 (1999) 292–311. better than all, Artificial Intelligence 137 (1-2) (2002) 239–263.
[43] G. Fumera, I. Pillai, F. Roli, A two-stage classifier with reject option for text [75] B. Gabrys, D. Ruta, Genetic algorithms in classifier fusion, Applied Soft
categorisation, 5th International Workshop on Statistical Techniques in Computing 6 (4) (2006) 337–347.
Pattern Recognition (SPR 2004), vol. 3138, Springer, Lisbon, Portugal, 2004, [76] I. Partalas, G. Tsoumakas, I. Vlahavas, Pruning an ensemble of classifiers via
pp. 771–779. reinforcement learning, Neurocomputing 72 (7–9) (2009) 1900–1909.
[44] P. Bartlett, M. Wegkamp, Classification with a reject option using a hinge loss, [77] Q. Dai, A competitive ensemble pruning approach based on cross-validation
Journal of Machine Learning Research 9 (2008) 1823–1840. technique, Knowledge-Based Systems (0) (2012), http://dx.doi.org/10.1016/
[45] M. Termenon, M. Graña, A two stage sequential ensemble applied to the j.knosys.2012.08.024.
classification of alzheimer’s disease based on MRI features, Neural Processing [78] Y. Peng, Q. Huang, P. Jiang, J. Jiang, Cost-sensitive ensemble of support vector
Letters 35 (1) (2012) 1–12. machines for effective detection of microcalcification in breast cancer
[46] P. Clark, T. Niblett, The CN2 induction algorithm, Machine Learning 3 (4) diagnosis, in: L. Wang, Y. Jin (Eds.), Fuzzy Systems and Knowledge
(1989) 261–283. Discovery, Lecture Notes in Computer Science, vol. 3614, Springer, Berlin/
[47] R. Rivest, Learning decision lists, Machine Learning 2 (3) (1987) 229–246. Heidelberg, 2005, pp. 483–493.
[48] Y. Freund, R. Schapire, A decision-theoretic generalization of on-line learning [79] K. Jackowski, B. Krawczyk, M. Woniak, Cost-sensitive splitting and selection
and an application to boosting, Journal of Computer and System Sciences 55 method for medical decision support system, in: H. Yin, J.A. Costa, G. Barreto
(1) (1997) 119–139, http://dx.doi.org/10.1006/jcss.1997.1504. (Eds.), Intelligent Data Engineering and Automated Learning – IDEAL 2012,
[49] X. Wu, V. Kumar, J.R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G.J. McLachlan, A. Lecture Notes in Computer Science, vol. 7435, Springer, Berlin Heidelberg,
Ng, B. Liu, P.S. Yu, Z.-H. Zhou, M. Steinbach, D.J. Hand, D. Steinberg, Top 10 2012, pp. 850–857.
algorithms in data mining, Knowledge and Information Systems 14 (1) (2008) [80] W. Du, Z. Zhan, Building decision tree classifier on private data, in:
1–37, http://dx.doi.org/10.1007/s10115-007-0114-2. Proceedings of the IEEE International Conference on Privacy, Security and
[50] R. Schapire, The strength of weak learnability, Machine Learning 5 (2) (1990) Data Mining – Volume 14, CRPIT ’14, Australian Computer Society, Inc.,
197–227, http://dx.doi.org/10.1023/A:1022648800760. Darlinghurst, Australia, 2002, pp. 1–8.
[51] J. Kivinen, M.K. Warmuth, Boosting as entropy projection, in: Proceedings of [81] B. Krawczyk, M. Wozniak, Privacy preserving models of k-NN algorithm, in:
the Twelfth Annual Conference on Computational Learning Theory, 1999. R. Burduk, M. Kurzynski, M. Wozniak, A. Zolnierek (Eds.), Computer
<http://dl.acm.org/citation.cfm?id=307424>. Recognition Systems 4, Advances in Intelligent and Soft Computing, vol. 95,
[52] D. Partridge, W. Krzanowski, Software diversity: practical statistics for its Springer, Berlin/Heidelberg, 2011, pp. 207–217.
measurement and exploitation, Information and Software Technology 39 (10) [82] Y. Lindell, B. Pinkas, Secure multiparty computation for privacy-preserving
(1997) 707–717. data mining, IACR Cryptology ePrint Archive 2008 (2008) 197.
[53] G. Brown, L. Kuncheva, ‘‘good’’ And ‘‘bad’’ diversity in majority vote [83] K. Walkowiak, S. Sztajer, M. Wozniak, Decentralized distributed computing
ensembles, in: Proceedings MCS 2010, pp. 124–133. system for privacy-preserving combined classifiers – modeling and
[54] M. Smetek, B. Trawinski, Selection of heterogeneous fuzzy model ensembles optimization, in: B. Murgante, O. Gervasi, A. Iglesias, D. Taniar, B. Apduhan
using self-adaptive genetic algorithms, New Generation Computing 29 (2011) (Eds.), Computational Science and Its Applications – ICCSA 2011, Lecture
309–327. Notes in Computer Science, Vol. 6782, Springer, Berlin/Heidelberg, 2011, pp.
[55] A.J.C. Sharkey, N. Sharkey, Combining diverse neural nets, Knowledge 512–525.
Engineering Review 12 (3) (1997) 231–247. [84] A. Pavlo, E. Paulson, A. Rasin, D. Abadi, D. DeWitt, S. Madden, M. Stonebraker,
[56] A. Krogh, J. Vedelsby, Neural network ensembles, cross validation, and active A comparison of approaches to large-scale data analysis, in: Proceedings of
learning, Advances in Neural Information Processing Systems 7 (1995) 231– the 2009 ACM SIGMOD International Conference on Management of Data,
238. SIGMOD ’09, ACM, New York, NY, USA, 2009, pp. 165–178.
M. Woźniak et al. / Information Fusion 16 (2014) 3–17 15
[85] R.E. Schapire, The boosting approach to machine learning: an overview, in: Conference on Knowledge Discovery and Data Mining, KDD ’03, ACM, New
MSRI Workshop on Nonlinear Estimation and Classification, Berkeley, CA, York, NY, USA, 2003, pp. 226–235.
USA, 2001. [113] Y. Zhang, X. Jin, An automatic construction and organization strategy for
[86] T. Ho, Random decision forests, in: Proceedings of the Third International ensemble learning on data streams, SIGMOD Record 35 (3) (2006) 28–33.
Conference on Document Analysis and Recognition (Volume 1)–Volume 1, [114] J. Kolter, M. Maloof, Dynamic weighted majority: a new ensemble method for
ICDAR ’95, IEEE Computer Society, Washington, DC, USA, 1995, pp. 278–. tracking concept drift, in: ICDM 2003, Third IEEE International Conference on
[87] T. Ho, The random subspace method for constructing decision forests, IEEE Data Mining, 2003, 2003, pp. 123–130.
Transactions on Pattern Analysis and Machine Intelligence 20 (1998) 832– [115] A. Tsymbal, M. Pechenizkiy, P. Cunningham, S. Puuronen, Dynamic
844. integration of classifiers for handling concept drift, Information Fusion 9
[88] L. Breiman, Random forests, Machine Learning 45 (1) (2001) 5–32. (1) (2008) 56–68.
[89] M. Skurichina, R. Duin, Bagging, boosting and the random subspace method [116] X. Zhu, X. Wu, Y. Yang, Effective classification of noisy data streams with
for linear classifiers, Pattern Analysis and Applications 5 (2) (2002) 121–135. attribute-oriented dynamic classifier selection, Knowledge Information
[90] G. Tremblay, R. Sabourin, P. Maupin, Optimizing nearest neighbour in random Systems 9 (3) (2006) 339–363.
subspaces using a multi-objective genetic algorithm, in: Proceedings of the [117] D. Tax, R. Duin, Using two-class classifiers for multiclass classification, in:
Pattern Recognition, 17th International Conference on (ICPR’04) Volume 1– Proceedings of the 16th International Conference on Pattern Recognition,
Volume 01, ICPR ’04, IEEE Computer Society, Washington, DC, USA, 2004, pp. 2002, vol. 2, 2002, pp. 124 –127.
208–. [118] T. Dietterich, G. Bakiri, Solving multiclass learning problems via error-
[91] S. Bay, Nearest neighbor classification from multiple feature subsets, correcting output codes, Journal of Artificial Intelligence Research 2 (1995)
Intelligent Data Analysis 3 (3) (1999) 191–209. 263–286.
[92] L. Nanni, Letters: Experimental comparison of one-class classifiers for online [119] K. Duan, S. Keerthi, W. Chu, S. Shevade, A. Poo, Multi-category classification
signature verification, Neurocomputing 69 (7–9) (2006) 869–873. by soft-max combination of binary classifiers, in: Proceedings of the 4th
[93] D. Tao, X. Tang, X. Li, X. Wu, Asymmetric bagging and random subspace for International Conference on Multiple Classifier Systems, MCS’03, Springer-
support vector machines-based relevance feedback in image retrieval, IEEE Verlag, Berlin, Heidelberg, 2003, pp. 125–134.
Transactions on Pattern Analysis Machine Intelligence 28 (7) (2006) 1088– [120] A. Passerini, M. Pontil, P. Frasconi, New results on error correcting output
1099. codes of kernel machines, IEEE Transactions on Neural Networks 15 (1)
[94] K. Ting, J. Wells, S. Tan, S. Teng, G. Webb, Feature-subspace aggregating: (2004) 45–54.
ensembles for stable and unstable learners, Machine Learning 82 (2011) 375– [121] T. Wu, C. Lin, R. Weng, Probability estimates for multi-class classification by
397. pairwise coupling, Journal of Machine Learning Research 5 (2004) 975–1005.
[95] R. Bryll, R. Gutierrez-Osuna, F. Quek, Attribute bagging: improving accuracy [122] J. Friedman, Another Approach to Polychotomous Classification, Tech. rep.,
of classifier ensembles by using random feature subsets, Pattern Recognition Department of Statistics, Stanford University, 1996.
36 (6) (2003) 1291–1302. [123] E. Hüllermeier, S. Vanderlooy, Combining predictions in pairwise
[96] Y. Baram, Partial classification: the benefit of deferred decision, IEEE classification: an optimal adaptive voting strategy and its relation to
Transactions on Pattern Analysis and Machine Intelligence 20 (8) (1998) weighted voting, Pattern Recognition 43 (1) (2010) 128–142.
769–776. [124] M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, F. Herrera, An overview of
[97] L. Cordella, P. Foggia, C. Sansone, F. Tortorella, M. Vento, A cascaded multiple ensemble methods for binary classifiers in multi-class problems:
expert system for verification, in: Multiple Classifier Systems, Lecture Notes Experimental study on one-vs-one and one-vs-all schemes, Pattern
in Computer Science, vol. 1857, Springer, Berlin/Heidelberg, 2000, pp. 330– Recognition 44 (8) (2011) 1761–1776.
339. [125] D. Tax, R.P.W. Duin, Characterizing one-class datasets, in: Proceedings of the
[98] K. Goebel, W. Yan, Choosing classifiers for decision fusion, in: Proceedings of Sixteenth Annual Symposium of the Pattern Recognition Association of South
the Seventh International Conference on Information Fusion, 2004, pp. 563– Africa, 2005, pp. 21–26.
568. [126] D. Tax, R. Duin, Combining one-class classifiers, in: Proceedings of the Second
[99] B. Baruque, S. Porras, E. Corchado, Hybrid classification ensemble using International Workshop on Multiple Classifier Systems, MCS ’01, Springer-
topology-preserving clustering, New Generation Computing 29 (2011) 329– Verlag, London, UK, 2001, pp. 299–308.
344. [127] T. Wilk, M. Wozniak, Soft computing methods applied to combination of one-
[100] L. Kuncheva, Clustering-and-selection model for classifier combination, in: class classifiers, Neurocomputing 75 (2012) 185–193.
Proceedings of the Fourth International Conference on Knowledge-Based [128] G. Giacinto, R. Perdisci, M. Del Rio, F. Roli, Intrusion detection in computer
Intelligent Engineering Systems and Allied Technologies, 2000, vol. 1, 2000, networks by a modular ensemble of one-class classifiers, Information Fusion
pp. 185–188. 9 (2008) 69–82.
[101] K. Jackowski, M. Wozniak, Algorithm of designing compound recognition [129] Y. Hu, Handbook of Neural Network Signal Processing, 1st ed., CRC Press, Inc.,
system on the basis of combining classifiers with simultaneous splitting Boca Raton, FL, USA, 2000.
feature space into competence areas, Pattern Analysis and Applications 12 (4) [130] K. Woods, W.P. Kegelmeyer Jr., K. Bowyer, Combination of multiple classifiers
(2009) 415–425. using local accuracy estimates, IEEE Transactions on Pattern Analysis and
[102] M. Wozniak, B. Krawczyk, Combined classifier based on feature space Machine Intelligence 19 (4) (1997) 405–410.
partitioning, International Journal of Applied Mathematics and Computer [131] M. Wozniak, M. Zmyslony, Combining classifiers using trained fuser –
Sciences 22 (4) (2012) 855–866. analytical and experimental results, Neural Network World 13 (7) (2010)
[103] H. Lee, C. Chen, J. Chen, Y. Jou, An efficient fuzzy classifier with feature 925–934.
selection based on fuzzy entropy, IEEE Transactions on Systems, Man, and [132] S. Raudys, Trainable fusion rules. I. Large sample size case, Neural Networks
Cybernetics, Part B: Cybernetics 31 (3) (2001) 426–432. 19 (10) (2006) 1506–1516.
[104] J. Hong, J. Min, U. Cho, S. Cho, Fingerprint classification using one-vs-all [133] M. van Erp, L. Vuurpijl, L. Schomaker, An overview and comparison of voting
support vector machines dynamically ordered with naïve bayes classifiers, methods for pattern recognition, in: Proceedings of the Eighth International
Pattern Recognition 41 (2008) 662–671. Workshop on Frontiers in Handwriting Recognition, 2002, 2002, pp. 195–200.
[105] A.R. Ko, R. Sabourin, A. Britto, From dynamic classifier selection to dynamic [134] M. Wozniak, K. Jackowski, Some remarks on chosen methods of classifier
ensemble selection, Pattern Recognition 41 (5) (2008) 1735–1748. fusion based on weighted voting, in: E. Corchado, X. Wu, E. Oja, A. Herrero, B.
[106] L. Didaci, G. Giacinto, F. Roli, G. Marcialis, A study on the performances of Baruque (Eds.), Hybrid Artificial Intelligence Systems, Lecture Notes in
dynamic classifier selection based on local accuracy estimation, Pattern Computer Science, vol. 5572, Springer, Berlin/Heidelberg, 2009, pp. 541–548.
Recognition 38 (11) (2005) 2188–2191. [135] S. Raudys, Trainable fusion rules. II. Small sample-size effects, Neural
[107] G. Giacinto, F. Roli, Dynamic classifier selection based on multiple classifier Networks 19 (10) (2006) 1517–1527.
behavior, Pattern Recognition 34 (9) (2001) 1879–1881. [136] H. Inoue, H. Narihisa, Optimizing a multiple classifier system, in: M. Ishizuka,
[108] M. de Souto, R. Soares, A. Santana, A. Canuto, Empirical comparison of A. Sattar (Eds.), PRICAI 2002: Trends in Artificial Intelligence, Lecture Notes in
dynamic classifier selection methods based on diversity and accuracy for Computer Science, vol. 2417, Springer, Berlin/Heidelberg, 2002, pp. 1–16.
building ensembles, in: IJCNN 2008, IEEE International Joint Conference on [137] L. Alexandre, A. Campilho, M. Kamel, Combining independent and unbiased
Neural Networks, 2008, IEEE World Congress on Computational Intelligence, classifiers using weighted average., in: Proceedings ICPR 2000, 2000, pp.
2008, pp. 1480–1487. 2495–2498.
[109] T. Woloszynski, M. Kurzynski, A probabilistic model of classifier competence [138] B. Biggio, G. Fumera, F. Roli, Bayesian analysis of linear combiners, in:
for dynamic ensemble selection, Pattern Recognition 44 (1011) (2011) 2656– Proceedings of the 7th International Conference on Multiple Classifier
2668. Systems, MCS ’07, Springer-Verlag, Berlin, Heidelberg, 2007, pp. 292–301.
[110] T. Woloszynski, M. Kurzynski, P. Podsiadlo, G. Stachowiak, A measure of [139] J. Kittler, F. Alkoot, Sum versus vote fusion in multiple classifier systems, IEEE
competence based on random classification for dynamic ensemble selection, Transactions on Pattern Analysis and Machine Intelligence 25 (1) (2003)
Information Fusion 13 (3) (2012) 207–213. 110–115.
[111] W. Street, Y. Kim, A streaming ensemble algorithm (sea) for large-scale [140] N. Rao, A generic sensor fusion problem: classification and function
classification, in: Proceedings of the Seventh ACM SIGKDD International estimation, in: F. Roli, J. Kittler, T. Windeatt (Eds.), Multiple Classifier
Conference on Knowledge Discovery and Data Mining, KDD ’01, ACM, New Systems, Lecture Notes in Computer Science, vol. 3077, Springer, 2004, pp.
York, NY, USA, 2001, pp. 377–382. 16–30.
[112] H. Wang, W. Fan, P. Yu, J. Han, Mining concept-drifting data streams using [141] D. Opitz, J. Shavlik, Generating accurate and diverse members of a neural-
ensemble classifiers, in: Proceedings of the Ninth ACM SIGKDD International network ensemble, in: NIPS, 1995, pp. 535–541.
16 M. Woźniak et al. / Information Fusion 16 (2014) 3–17
[142] L. Rokach, O. Maimon, Feature set decomposition for decision trees, [171] J. Peters, N. Verhoest, R. Samson, M. Meirvenne, L. Cockx, B. Baets,
Intelligent Data Analysis 9 (2) (2005) 131–158. Uncertainty propagation in vegetation distribution models based on
[143] G. Fumera, F. Roli, A theoretical and experimental analysis of linear ensemble classifiers, Ecological Modelling 220 (6) (2009) 791–804.
combiners for multiple classifier systems, IEEE Transactions on Pattern [172] M. Han, X. Zhu, W. Yao, Remote sensing image classification based on neural
Analysis and Machine Intelligence 27 (6) (2005) 942–956, http://dx.doi.org/ network ensemble algorithm, Neurocomputing 78 (1) (2012) 133–138.
10.1109/TPAMI.2005.109. [173] B. Waske, M. Braun, Classifier ensembles for land cover mapping using
[144] M. Wozniak, Experiments on linear combiners, in: E. Pietka, J. Kawa (Eds.), multitemporal SAR imagery, ISPRS Journal of Photogrammetry and Remote
Information Technologies in Biomedicine, Advances in Soft Computing, vol. Sensing 64 (5) (2009) 450–457 (theme Issue: Mapping with SAR: Techniques
47, Springer, Berlin/Heidelberg, 2008, pp. 445–452. and Applications).
[145] R. Duin, The combining classifier: to train or not to train? in: Proceedings of [174] B. Waske, S. van der Linden, C. Oldenburg, B. Jakimow, A. Rabe, P. Hostert,
the 16th International Conference on Pattern Recognition, 2002, vol. 2, 2002, imageRF – a user-oriented implementation for remote sensing image analysis
pp. 765–770. with random forests, Environmental Modelling & Software 35 (0) (2012)
[146] R. Jacobs, M. Jordan, S. Nowlan, G. Hinton, Adaptive mixtures of local experts, 192–193.
Neural Computation 3 (1991) 79–87. [175] U. Maulik, D. Chakraborty, A self-trained ensemble with semisupervised
[147] R. Jacobs, Methods for combining experts’ probability assessments, Neural SVM: an application to pixel classification of remote sensing imagery, Pattern
Computation 7 (5) (1995) 867–888. Recognition 44 (3) (2011) 615–623.
[148] V. Tresp, M. Taniguchi, Combining estimators using non-constant weighting [176] A. Henriques, A. Doria-Neto, R. Amaral, Classification of multispectral images
functions, Advances in Neural Information Processing Systems, vol. 7, MIT in coral environments using a hybrid of classifier ensembles,
Press, 1995, pp. 419–426. Neurocomputing 73 (7–9) (2010) 1256–1264.
[149] P. Cheeseman, M. Self, J. Kelly, J. Stutz, W. Taylor, D. Freeman, AutoClass: a [177] Y. Maghsoudi, M. Collins, D. Leckie, Polarimetric classification of boreal forest
Bayesian classification system, in: Machine Learning: Proceedings of the Fifth using nonparametric feature selection and multiple classifiers, International
International Workshop, Morgan Kaufman, 1988. Journal of Applied Earth Observation and Geoinformation 19 (0) (2012) 139–
[150] S. Shlien, Multiple binary decision tree classifiers, Pattern Recognition 23 (7) 150.
(1990) 757–763. [178] L. Bruzzone, R. Cossu, G. Vernazza, Combining parametric and non-
[151] M. Wozniak, Experiments with trained and untrained fusers, in: E. Corchado, parametric algorithms for a partially unsupervised classification of
J. Corchado, A. Abraham (Eds.), Innovations in Hybrid Intelligent Systems, multitemporal remote-sensing images, Information Fusion 3 (4) (2002)
Advances in Soft Computing, vol. 44, Springer, Berlin/Heidelberg, 2007, pp. 289–297.
144–150. [179] L. Bruzzone, R. Cossu, G. Vernazza, Detection of land-cover transitions by
[152] M. Wozniak, Evolutionary approach to produce classifier ensemble based on combining multidate classifiers, Pattern Recognition Letters 25 (13) (2004)
weighted voting, in: NaBIC 2009, World Congress on Nature & Biologically 1491–1500.
Inspired Computing, 2009, IEEE, 2009, pp. 648–653. [180] P. Du, S. Liu, J. Xia, Y. Zhao, Information fusion techniques for change
[153] L. Lin, X. Wang, B. Liu, Combining multiple classifiers based on statistical detection from multi-temporal remote sensing images, Information Fusion
method for handwritten chinese character recognition, in: Proceedings of the 14 (1) (2013) 19–27.
2002 International Conference on Machine Learning and Cybernetics, 2002, [181] P. Arun-Raj-Kumar, S. Selvakumar, Distributed denial of service attack
vol. 1, 2002, pp. 252–255. detection using an ensemble of neural classifier, Computer
[154] Z. Zheng, B. Padmanabhan, Constructing ensembles from data envelopment Communications 34 (11) (2011) 1328–1341.
analysis, INFORMS Journal on Computing 19 (4) (2007) 486–496. [182] P. Kumar, S. Selvakumar, Detection of distributed denial of service attacks
[155] D. Wolpert, Stacked generalization, Neural Networks 5 (1992) 241–259. using an ensemble of adaptive and hybrid neuro-fuzzy systems, Computer
[156] Y. Huang, C. Suen, A method of combining multiple experts for the Communications (0) (2012).
recognition of unconstrained handwritten numerals, IEEE Transactions on [183] A. Shabtai, R. Moskovitch, Y. Elovici, C. Glezer, Detection of malicious code by
Pattern Analysis and Machine Intelligence 17 (1) (1995) 90–94. applying machine learning classifiers on static features: a state-of-the-art
[157] M.M. Gaber, A. Zaslavsky, S. Krishnaswamy, Mining data streams: a review, survey, Information Security Technical Report 14 (1) (2009) 16–29.
SIGMOD Record 34 (2) (2005) 18–26. [184] M. Locasto, K. Wang, A. Keromytis, S. Stolfo, Flips: hybrid adaptive intrusion
[158] A. Patcha, J.-M. Park, An overview of anomaly detection techniques: existing prevention, in: Proceedings of the 8th International Conference on Recent
solutions and latest technological trends, Computer Network 51 (12) (2007) Advances in Intrusion Detection, RAID’05, Springer-Verlag, Berlin,
3448–3470. Heidelberg, 2006, pp. 82–101.
[159] M.M. Black, R.J. Hickey, Classification of customer call data in the presence of [185] K. Wang, G. Cretu, S. Stolfo, Anomalous payload-based worm detection and
concept drift and noise, in: Proceedings of the First International Conference signature generation, in: Proceedings of the 8th International Conference on
on Computing in an Imperfect World, Soft-Ware 2002, Springer-Verlag, Recent Advances in Intrusion Detection, RAID’05, Springer-Verlag, Berlin,
London, UK, 2002, pp. 74–87. Heidelberg, 2006, pp. 227–246.
[160] H. Wang, W. Fan, P.S. Yu, J. Han, Mining concept-drifting data streams using [186] S. Peddabachigari, A. Abraham, C. Grosan, J. Thomas, Modeling intrusion
ensemble classifiers, in: Proceedings of the Ninth ACM SIGKDD International detection system using hybrid intelligent systems, Journal of Network and
Conference on Knowledge Discovery and Data Mining, KDD ’03, ACM, New Computer Applications 30 (1) (2007) 114–132.
York, NY, USA, 2003, pp. 226–235. [187] D.-I. Curiac, C. Volosencu, Ensemble based sensing anomaly detection in
[161] M.M. Gaber, P.S. Yu, Classification of changes in evolving data streams using wireless sensor networks, Expert Systems with Applications 39 (10) (2012)
online clustering result deviation, in: Proc. Of International Workshop on 9087–9096.
Knowledge Discovery in Data Streams, 2006. [188] R.J. Bolton, D.J. Hand, Statistical fraud detection: a review, Statistical Science
[162] M. Markou, S. Singh, Novelty detection: a review – Part 1: Statistical 17 (3) (2002) 235–255.
approaches, Signal Process 83 (12) (2003) 2481–2497. [189] F. Louzada, A. Ara, Bagging k-dependence probabilistic networks: an
[163] M. Salganicoff, Density-adaptive learning and forgetting, in: Machine alternative powerful fraud detection tool, Expert Systems with Applications
Learning: Proceedings of the Tenth Annual Conference, Morgan Kaufmann, 39 (14) (2012) 11583–11592.
San Francisco, CA, 1993. [190] S. Bhattacharyya, S. Jha, K. Tharakunnel, J. Westland, Data mining for credit
[164] R. Klinkenberg, T. Joachims, Detecting concept drift with support vector card fraud: a comparative study, Decision Support Systems 50 (3) (2011)
machines, in: Proceedings of the Seventeenth International Conference on 602–613.
Machine Learning, ICML ’00, Morgan Kaufmann Publishers Inc., San Francisco, [191] L. Yu, W. Yue, S. Wang, K. Lai, Support vector machine based multiagent
CA, USA, 2000, pp. 487–494. ensemble learning for credit risk evaluation, Expert Systems with
[165] M. Baena-Garcı́a, J. del Campo-Ávila, R. Fidalgo, A. Bifet, R. Gavaldá, R. Applications 37 (2) (2010) 1351–1360.
Morales-Bueno, Early drift detection method, in: Fourth International [192] Y. Kim, S. Sohn, Stock fraud detection using peer group analysis, Expert
Workshop on Knowledge Discovery from Data Streams, 2006. Systems with Applications 39 (10) (2012) 8986–8992.
[166] I. Zliobaite, Change with delayed labeling: when is it detectable?, in: [193] B. Twala, Multiple classifier application to credit risk assessment, Expert
Proceedings of the 2010 IEEE International Conference on Data Mining Systems with Applications 37 (4) (2010) 3326–3336.
Workshops, ICD-MW ’10, IEEE Computer Society, Washington, DC, USA, 2010, [194] S. Finlay, Multiple classifier architectures and their application to credit risk
pp 843–850. assessment, European Journal of Operational Research 210 (2) (2011) 368–
[167] G. Giacinto, F. Roli, L. Bruzzone, Combination of neural and statistical 378.
algorithms for supervised classification of remote-sensing images, Pattern [195] G. Wang, J. Ma, A hybrid ensemble approach for enterprise credit risk
Recognition Letters 21 (5) (2000) 385–397. assessment based on support vector machine, Expert Systems with
[168] V. Rodriguez-Galiano, B. Ghimire, J. Rogan, M. Chica-Olmo, J. Rigol-Sanchez, Applications 39 (5) (2012) 5325–5331.
An assessment of the effectiveness of a random forest classifier for land-cover [196] M. Kim, D. Kang, Classifiers selection in ensembles using genetic algorithms
classification, ISPRS Journal of Photogrammetry and Remote Sensing 67 (0) for bankruptcy prediction, Expert Systems with Applications 39 (10) (2012)
(2012) 93–104. 9308–9314.
[169] P. Gislason, J. Benediktsson, J. Sveinsson, Random forests for land cover [197] P. Ravisankar, V. Ravi, I. Bose, Failure prediction of dotcom companies using
classification, Pattern Recognition Letters 27 (4) (2006) 294–300. neural network–genetic programming hybrids, Information Sciences 180 (8)
[170] J.-W. Chan, D. Paelinckx, Evaluation of random forest and Adaboost tree- (2010) 1257–1267.
based ensemble classification and spectral band selection for ecotope [198] P. Ravisankar, V. Ravi, G. Rao, I. Bose, Detection of financial statement fraud
mapping using airborne hyperspectral imagery, Remote Sensing of and feature selection using data mining techniques, Decision Support
Environment 112 (6) (2008) 2999–3011. Systems 50 (2) (2011) 491–500.
M. Woźniak et al. / Information Fusion 16 (2014) 3–17 17
[199] C. Tsai, Combining cluster analysis with classifier ensembles to predict [215] A. Savio, M. Garcia-Sebastian, D. Chyzyk, C. Hernandez, M. Graña, A. Sistiaga,
financial distress, Information Fusion (0) (2011). A.L. de Munain, J. Villanua, Neurocognitive disorder detection based on
[200] Y. Peng, G. Wang, G. Kou, Y. Shi, An empirical study of classification algorithm feature vectors extracted from VBM analysis of structural MRI, Computers in
evaluation for financial risk prediction, Applied Soft Computing 11 (2) (2011) Biology and Medicine 41 (8) (2011) 600–610.
2906–2915. [216] D. Chyzhyk, M. Graña, A. Savio, J. Maiora, Hybrid dendritic computing with
[201] V. Ravi, H. Kurniawan, P. Nwee-Kok-Thai, P. Ravi-Kumar, Soft computing kernel-LICA applied to alzheimer’s disease detection in MRI, Neurocomputing
system for bank performance prediction, Applied Soft Computing 8 (1) (2008) 75 (1) (2012) 72–77.
305–315. [217] L. Kuncheva, J. Rodriguez, Classifier ensembles for fMRI data analysis: an
[202] H. Zhao, A. Sinha, W. Ge, Effects of feature construction on classification experiment, Magnetic Resonance Imaging 28 (4) (2010) 583–593.
performance: an empirical study in bank failure prediction, Expert Systems [218] C. Plumpton, L. Kuncheva, N. Oosterhof, S. Johnston, Naive random subspace
with Applications 36 (2, Part 2) (2009) 2633–2644. ensemble with linear classifiers for real-time classification of fMRI data,
[203] K. Aral, H. Guvenir, I. Sabuncuoglu, A. Akar, A prescription fraud detection Pattern Recognition 45 (6) (2012) 2101–2108.
model, Computer Methods and Programs in Biomedicine 106 (1) (2012) 37– [219] C. Cabral, M. Silveira, P. Figueiredo, Decoding visual brain states from fMRI
46. using an ensemble of classifiers, Pattern Recognition 45 (6) (2012) 2064–
[204] I. Christou, M. Bakopoulos, T. Dimitriou, E. Amolochitis, S. Tsekeridou, C. 2074.
Dimitriadis, Detecting fraud in online games of chance and lotteries, Expert [220] G. Adomavicius, R. Sankaranarayanan, S. Sen, A. Tuzhilin, Incorporating
Systems with Applications 38 (10) (2011) 13158–13169. contextual information in recommender systems using a multidimensional
[205] H. Farvaresh, M. Sepehri, A data mining framework for detecting subscription approach, ACM Transactions Information Systems 23 (1) (2005) 103–145.
fraud in telecommunication, Engineering Applications of Artificial [221] J. Konstan, J. Riedl, How online merchants predict your preferences and prod
Intelligence 24 (1) (2011) 182–194. you to purchase, IEEE Spectrum 49 (10) (2012) 48–56.
[206] L. Subelj, S. Furlan, M. Bajec, An expert system for detecting automobile [222] R. Burke, Hybrid recommender systems: survey and experiments, User
insurance fraud using social network analysis, Expert Systems with Modeling and User-Adapted Interaction 12 (4) (2002) 331–370.
Applications 38 (1) (2011) 1039–1052. [223] M. Balabanović, Y. Shoham, Fab: content-based, collaborative
[207] A.X. Garg, N.K.J. Adhikari, H. McDonald, M.P. Rosas-Arellano, P.J. Devereaux, J. recommendation, Communications of the ACM 40 (3) (1997) 66–72.
Beyene, J. Sam, R.B. Haynes, Effects of computerized clinical decision support [224] M.J. Pazzani, A framework for collaborative, content-based and demographic
systems on practitioner performance and patient outcomes: a systematic filtering, Artificial Intelligence Review 13 (5–6) (1999) 393–408.
review, Journal of the American Medical Association 293 (10) (2005) 1223– [225] M. Jahrer, A. Töscher, R. Legenstein, Combining predictions for accurate
1238. recommender systems, in: Proceedings of the 16th ACM SIGKDD
[208] J. Eom, S. Kim, B. Zhang, AptaCDSS-E: a classifier ensemble-based clinical International Conference on Knowledge Discovery and Data Mining, KDD
decision support system for cardiovascular disease level prediction, Expert ’10, ACM, New York, NY, USA, 2010, pp. 693–702.
Systems with Applications 34 (4) (2008) 2465–2479. [226] C. Porcel, A. Tejeda-Lorente, M. Martı´nez, E. Herrera-Viedma, A hybrid
[209] R. Das, I. Turkoglu, A. Sengur, Effective diagnosis of heart disease through recommender system for the selective dissemination of research resources in
neural networks ensembles, Expert Systems with Applications 36 (4) (2009) a technology transfer office, Information Sciences 184 (1) (2012) 1–19.
7675–7680. [227] M. Claypool, A. Gokhale, T. Miranda, P. Murnikov, D. Netes, M. Sartin,
[210] R. Das, I. Turkoglu, A. Sengur, Diagnosis of valvular heart disease through Combining content-based and collaborative filters in an online newspaper,
neural networks ensembles, Computer Methods and Programs in in: Proceedings of the ACM SIGIR ’99 Workshop on Recommender Systems:
Biomedicine 93 (2) (2009) 185–191. Algorithms and Evaluation, ACM, 1999.
[211] W. Baxt, Improving the accuracy of an artificial neural network using [228] D. Billsus, M. Pazzani, User modeling for adaptive news access, User Modeling
multiple differently trained networks, Neural Computation 4 (5) (1992) 772– and User-Adapted Interaction 10 (2–3) (2000) 147–180.
780. [229] T. Tran, R. Cohen, Hybrid recommender systems for electronic commerce, in:
[212] X. Zhang, J. Mesirov, D. Waltz, Hybrid system for protein secondary structure Knowledge-Based Electronic Markets, Papers from the AAAI Workshop, AAAI
prediction, Journal of Molecular Biology 225 (4) (1992) 1049–1063. Technical Report WS-00-04, AAAI Press, Menlo Park, CA, 2000, pp. 78–83.
[213] L. Nanni, Ensemble of classifiers for protein fold recognition, [230] M. Kunaver, T. Pozrl, M. Pogacnik, J. Tasic, Optimisation of combined
Neurocomputing 69 (7) (2006) 850–853. collaborative recommender systems, AEU – International Journal of
[214] T. Yang, V. Kecman, L. Cao, C. Zhang, J.Z. Huang, Margin-based ensemble Electronics and Communications 61 (7) (2007) 433–443.
classifier for protein fold recognition, Expert Systems with Applications 38 [231] M. Goksedef, S. Gundoz-Oguducu, Combination of web page recommender
(10) (2011) 12348–12355. systems, Expert Systems with Applications 37 (4) (2010) 2911–2922.