A hybrid ensemble pruning approach based on consensus clustering and multi-objective evolutionary algorithm for sentiment classification
A hybrid ensemble pruning approach based on consensus clustering and multi-objective evolutionary algorithm for sentiment classification
a r t i c l e i n f o a b s t r a c t
Article history: Sentiment analysis is a critical task of extracting subjective information from online
Received 16 May 2016 text documents. Ensemble learning can be employed to obtain more robust classification
Revised 26 December 2016
schemes. However, most approaches in the field incorporated feature engineering to build
Accepted 16 February 2017
efficient sentiment classifiers.
Available online 10 March 2017
The purpose of our research is to establish an effective sentiment classification scheme
Keywords: by pursuing the paradigm of ensemble pruning. Ensemble pruning is a crucial method
Ensemble pruning to build classifier ensembles with high predictive accuracy and efficiency. Previous stud-
Consensus clustering ies employed exponential search, randomized search, sequential search, ranking based
Multi-objective evolutionary algorithm pruning and clustering based pruning. However, there are tradeoffs in selecting the en-
Sentiment classification semble pruning methods. In this regard, hybrid ensemble pruning schemes can be more
promising.
In this study, we propose a hybrid ensemble pruning scheme based on clustering and
randomized search for text sentiment classification. Furthermore, a consensus clustering
scheme is presented to deal with the instability of clustering results. The classifiers of
the ensemble are initially clustered into groups according to their predictive character-
istics. Then, two classifiers from each cluster are selected as candidate classifiers based on
their pairwise diversity. The search space of candidate classifiers is explored by the elitist
Pareto-based multi-objective evolutionary algorithm.
For the evaluation task, the proposed scheme is tested on twelve balanced and un-
balanced benchmark text classification tasks. In addition, the proposed approach is ex-
perimentally compared with three ensemble methods (AdaBoost, Bagging and Random
Subspace) and three ensemble pruning algorithms (ensemble selection from libraries of
models, Bagging ensemble selection and LibD3C algorithm). Results demonstrate that the
consensus clustering and the elitist pareto-based multi-objective evolutionary algorithm
can be effectively used in ensemble pruning. The experimental analysis with conventional
ensemble methods and pruning algorithms indicates the validity and effectiveness of the
proposed scheme.
© 2017 Elsevier Ltd. All rights reserved.
∗
Corresponding author.
E-mail addresses: [email protected] (A. Onan), [email protected] (S. Korukoğlu), [email protected] (H. Bulut).
http://dx.doi.org/10.1016/j.ipm.2017.02.008
0306-4573/© 2017 Elsevier Ltd. All rights reserved.
A. Onan et al. / Information Processing and Management 53 (2017) 814–833 815
1. Introduction
Ensemble learning is an important research direction of pattern recognition and machine learning. The main idea behind
ensemble learning is to combine the predictions of multiple classification algorithms so that a more robust and accurate
classification model can be constructed (Dietterich, 20 0 0). With the use of ensemble learning, remarkable improvement
in generalization ability can be achieved. In addition, the variance and bias of classification and the dependency of the
results to the odd characteristics of a single training set can be reduced (Kuncheva, 2014). Ensemble learning can be
successfully utilized for supervised learning (such as classification and regression) and for unsupervised learning (such as
cluster analysis) (Strehl & Ghosh, 2003).
Sentiment analysis (also known as opinion mining) is a subfield of natural language processing and text mining, which
aims to classify text documents as positive, negative and neutral. The Web is a rich, widely distributed source of information
with a progressively expanding volume of data (Bhatia & Khalid, 2008). The information available on the Web can provide
valuable information to governments, business organizations and individual decision makers. Public sentiment toward
policies, product and services can be beneficial to organizations (Wang, Sun, Ma, Xu, & Gu, 2014). The identification of
subjective information is very important to generate structured knowledge that will serve crucial information to decision
support systems and individual decision makers (Fersini, Messina, & Pozzi, 2014). Hence, sentiment analysis is an important
research direction. Recent research contributions on sentiment analysis indicate that the predictive performance of senti-
ment classification can be greatly improved with the use of ensemble learning methods (Fersini et al., 2014; Wang et al.,
2014; Xia, Zong, & Li, 2011).
The ensemble learning process consists of three main phases, which are the ensemble generation, the ensemble pruning
and the ensemble integration (Mendes-Moreira, Soares, Jorge, & De Sousa, 2012; Roli, Giacinto, & Vernazza, 2001). In the
ensemble generation phase, the classification algorithms to be utilized in the classifier ensemble are generated. The learning
algorithms can be generated either homogeneous or heterogeneous. In homogeneous classifier ensembles, the same learning
algorithm is utilized. In this scheme, the diversity is achieved by taking different parameter values, by randomization of
the learning process, by differentiation of the training subsets and/or by taking different input attributes (Tsoumakas,
Partalas, & Vlahavas, 2008). Bagging and Boosting algorithms are two well-known representatives for homogenous classifier
ensemble methods. In contrast, heterogeneous classifier ensembles are generated by using different learning algorithms. In
this scheme, high diversity of the ensemble is expected. In the ensemble integration, the predictions of multiple learning
algorithms are combined by using an ensemble combination rule, such as majority voting and stacked generalization.
The ensemble pruning (also known as selective ensemble, ensemble thinning and ensemble selection) is the process
of obtaining a subset of classifiers from the classifier ensemble so that the predictive performance and computational
efficiency of the ensemble have been enhanced. It has been empirically validated that the utilization of some classification
algorithms, rather than all available classifiers, enhances the predictive performance of the classifier ensemble (Zhou, Wu, &
Tang, 2002). One of the critical issues in developing classifier ensembles is to provide high diversity (Gashler, Giraud-Carrier,
& Martinez, 2008). A diverse subset of classifiers can be obtained with the ensemble pruning. The ensemble pruning meth-
ods can be broadly divided into five groups, as exponential search, randomized search, sequential search, ranking based
pruning and clustering based pruning methods (Mendes-Moreira et al., 2012). In exponential search methods, all possible
subsets of the classification algorithms within the ensemble are taken into consideration. This requires enumerating 2k −1
possible subsets for a classifier ensemble containing k classification algorithms, which makes the search space large. In
randomized search methods, the metaheuristic search algorithms are utilized to explore more effectively the search space of
possible classifiers. The metaheuristic algorithms utilized in the ensemble pruning include genetic algorithms, tabu search,
population based incremental learning and stochastic search (Ruta & Gabrys, 2001; Zhou & Tang, 2003; Partalas et al.,
2006). In sequential search methods, the search can be forward, backward or forward-backward. In the forward methods,
the search starts with an empty classifier ensemble. In each iteration, a learning algorithm is added to the ensemble. In
contrast, classifier ensemble starts with all learning algorithms and the learning algorithms are iteratively eliminated from
the ensemble in the backward methods. In ranking based pruning, the learning algorithms of the classifier ensemble are
ranked based on a particular evaluation criterion. The k learning algorithms with the highest evaluation value are included
in the pruned ensemble, whereas the other algorithms are eliminated. In clustering based pruning methods, a clustering
algorithm (such as k-means) is utilized to group the classification algorithms within the ensemble into clusters based on
the predictive performance of the algorithms. In this scheme, a number of classifiers are selected from each cluster to
construct the pruned classifier ensemble (Mendes-Moreira et al., 2012).
The identification of an appropriate subset of classifiers is an NP-complete problem and it is a computationally intensive
task. Since exponential methods require enumerating all possible subsets, these methods are only suitable for a classifier
ensemble with a few classifiers (Martinez-Munoz & Suarez, 2006; Tamon & Xiang, 20 0 0). The ranking based pruning
methods require computing an evaluation criterion for each classifier. Since the decision of classifiers to be included within
the ensemble is based on the evaluation criterion, the ranking based methods are computationally efficient techniques.
However, the predictive performance of classifier ensembles obtained by ranking based pruning is relatively low (Pinto,
2013). The clustering based pruning methods may suffer from the cluster instability (Lin et al., 2014). There are tradeoffs in
selecting the ensemble pruning methods.
The development and application of hybrid algorithms is a promising research interest in machine learning. In addition,
recent research contributions in the ensemble pruning indicate that an effective ensemble pruning scheme can be built
816 A. Onan et al. / Information Processing and Management 53 (2017) 814–833
by integrating the two types of ensemble pruning (Cheng, Wang, Hou, Tan, & Cao, 2013; Lin et al., 2014). This research
is motivated by the performance of hybrid schemes in ensemble pruning. In this direction, the paper presents a hybrid
ensemble pruning approach, which integrates clustering and randomized search based pruning. As mentioned in advance,
the clustering based pruning methods may suffer from the instability. Consensus clustering is the process of aggregating
a number of different clustering results so that a more robust and stable final clustering result can be obtained (Ghosh &
Acharya, 2011). In this regard, consensus clustering can be a viable method to build robust clusters in the ensemble pruning.
In our proposed scheme, consensus clustering is utilized to group the classifiers of the ensemble into clusters based on
their predictive characteristics on the training set. The consensus clustering scheme utilizes self-organizing maps, expecta-
tion maximization and K-means++ as the base clustering algorithms. Then, two classifiers from each cluster are selected as
the candidate classifiers based on their Q-statistics value. Finally, the search space of candidate classifiers is examined by the
elitist pareto-based multi-objective evolutionary algorithm to identify the pruned ensemble. The performance of proposed
hybrid ensemble pruning scheme has been empirically validated on twelve public text sentiment classification benchmarks.
To summarize, many papers have examined the utilization of ensemble learning algorithms on sentiment analysis,
motivated by the predictive performance of ensemble learning methods (Prabowo & Thelwall, 2009; Wang et al., 2014; Xia
et al., 2011). In order to obtain a promising generalization performance with the ensemble methods, large ensembles, which
involves storage spaces and long time to construct the predictive model, are required (Hernandez-Lobato, Martinez-Munoz,
& Suarez, 2011). With the use of ensemble pruning, the aforementioned problems of ensemble methods can be reduced,
while improving generalization performance. Ensemble pruning can be effectively utilized in classification and regression
problems (Ma, Dai, & Liu, 2015). Ensemble learning on sentiment analysis has been extensively studied in the literature.
However, there are few works examining the predictive performance of ensemble pruning methods on text classification
and sentiment analysis. Motivated by the use of ensemble pruning methods to enhance the robustness and predictive
performance of classification models on several domains, this paper seeks to develop an efficient ensemble for sentiment
analysis based on pruning.
The main contributions of our proposed hybrid ensemble pruning scheme can be summarized as follows:
• A novel hybrid ensemble pruning approach based on consensus clustering and the elitist pareto-based multi-objective
evolutionary algorithm has been proposed. To the best of our knowledge, this is the first study in ensemble pruning,
which employs the paradigm of consensus clustering to obtain stable clusters. In addition, this is the first study, where
the elitist pareto-based multi-objective evolutionary algorithm has been employed to explore the search space of
possible classifiers.
• Ensemble learning is a well-studied research direction in sentiment analysis and text classification. However, the
ensemble pruning has been underexplored in sentiment analysis and text classification domain. In this regard, this is the
first comprehensive study on constructing an efficient classifier ensemble for text classification based on the paradigm
of ensemble pruning.
The rest of this paper is organized as follows. Section 2 presents the literature review on ensemble pruning and ensem-
ble learning in sentiment analysis. Section 3 presents the theoretical foundations, Section 4 presents our proposed hybrid
ensemble pruning approach, Section 5 presents the experimental results and Section 6 presents the concluding remarks.
2. Related works
This section briefly reviews the related work on the ensemble pruning and the ensemble learning in sentiment analysis.
Ruta and Gabrys (2001) examined the performance of three randomized pruning algorithms, namely genetic algo-
rithms, tabu search and population-based incremental learning. In order to explore the search space with the evolutionary
optimization methods, the majority voting error is utilized as the fitness function. Zhou, Wu, & Tang, 2002 presented
a randomized ensemble pruning scheme based on genetic algorithms, which is called GASEN (Genetic algorithm based
selective ensemble). The scheme starts by training a number of neural networks. Then, it assigns random weight values to
the neural networks and adjusts the weight values with the use of genetic algorithm. Finally, some of the neural networks
are selected based on the evolved weight values. In another study, Zhou and Tang (2003) utilized GASEN ensemble
pruning scheme to obtain a pruned ensemble of decision tree algorithms. Sheen and Sirisha (2013) presented an ensemble
pruning algorithm based on harmony search. In this scheme, multiple heterogeneous classifiers are utilized to construct
the ensemble of classifiers. Then, the search space of classifier subsets is evaluated by using the harmony search algorithm.
The proposed scheme is applied for malware detection. Dai (2013) presented an ensemble pruning algorithm based on
randomized greedy selective search. In this scheme, the randomization is integrated into the process of greedy ensemble
pruning. In addition, the pruned ensemble is obtained by the ballot. In another study, Dai and Liu (2013) presented a
backtracking ensemble pruning algorithm, where repeated solution vectors are not allowed to enhance the search efficiency
of the scheme. Mendialdua, Arruti, Jauregi, Lazkano, and Sierra (2015) employed the estimation of distribution algorithm to
select the appropriate classifiers to be included in the pruned ensemble.
A. Onan et al. / Information Processing and Management 53 (2017) 814–833 817
Aksela (2003) presented an exponential search based approach for the ensemble pruning. In this scheme, the perfor-
mance of several selection measures, such as correlation between errors, Q statistics and mutual information, are taken
into account. In another study, Martinez-Munoz and Suarez (2007) presented an ensemble pruning algorithm based on
the reweighting for the training instances in AdaBoost that is used to modify the random aggregation ordering of Bagging.
The experimental results indicated that the pruned ensemble can improve the classification speed, reduce the memory
requirements and enhance the predictive performance.
Margineantu and Dietterich (1997) described a sequential ensemble pruning algorithm called the reduce error pruning
with back-fitting. The algorithm performs similar to the forward stepwise selection in the first two stages. In the third
stage, a classifier is added to the classifier ensemble such that the voted combination of all classifiers of the ensemble has
the lowest pruning set error. In another study, Caruana, Niculescu-Mizil, Crew, and Ksikes (2004) presented an ensemble
selection scheme from a library of classification algorithms. In this scheme, many different machine learning algorithms
are utilized to construct a model library. Then, a selection strategy, such as the forward stepwise selection, is utilized to
guide the search of subsets. Coelho and Von Zuben (2006) presented two sequential search based approaches for ensemble
pruning, namely the ensemble pruning without exploration and the ensemble pruning with exploration. In the ensemble
pruning without exploration, the candidates are initially ranked according to their performance on the validation set. Then,
the classifier with the worst performance is removed from the ensemble. If the classifier ensemble has a higher predictive
performance with the removal of worst classifier, it is pruned from the ensemble. Otherwise, the classifier is inserted again
to the ensemble. The selection process continues as long as the predictive performance enhances. In the ensemble pruning
with exploration, the candidates are again ranked. Rather than choosing only the classifier with the worst performance, all
the base learners of the ensemble are considered at each iteration. Partalas, Tsoumakas, and Vlahavas (2012) examined the
predictive performance of greedy search based ensemble pruning schemes. In the comparative analysis, different aspects
of greedy search based methods, such as the direction of the search, the evaluation data set, the evaluation measure and
the size of the final ensemble, are taken into account. Greedy ensemble pruning approaches have high efficiency. However,
they tend to obtain suboptimal solutions. In response, Dai, Zhang, and Liu (2015) presented a reverse reduce error-based
ensemble pruning algorithm which incorporates a subtraction operation. In this scheme, the classifiers that are not included
in the pruned ensemble are further examined.
Kotsiantis and Pintelas (2005) presented a ranking based ensemble pruning algorithm. The algorithm consists of six
stages. Initially, the dataset is sampled at random about 20% of the initial set. The new data set is randomly divided into
three parts. Two of the parts are used as training set and the other part is used as the testing set. The results are obtained
by averaging the three test sets. For each algorithm, a t-test is done to compare with the most accurate algorithm. If a par-
ticular algorithm has statistically worse performance according to t-test with p< 0.05, the algorithm is rejected. In another
study, Swiderski, Osowski, Kruk, and Barhoumi (2016) presented a dynamic classifier selection algorithm. For each test
sample, one classifier from the ensemble with the best performance is selected to perform the final classification task. The
performance of classification algorithms is evaluated in terms of local discriminatory power. Galar, Fernandez, Barrenechea,
Bustince, and Herrera (2016) presented an ordering-based pruning metric to deal with imbalanced classification problems,
where classes have skewed distribution.
Zhang and Cao (2014) introduced a clustering based ensemble pruning scheme. In this scheme, a spectral clustering
algorithm is utilized to group classifiers in the ensemble into two clusters based on the classifier similarity defined in terms
of predictive accuracy and diversity. Then, one cluster of the ensemble is pruned. Xiao, Xiao and Wang (2016) presented an
ensemble classification scheme for credit scoring based on supervised clustering. In this scheme, clustering was employed
to partition the data samples of each class. Then, the training subsets with high diversity are obtained by combining
clusters from different classes.
Lin et al. (2014) presented a hybrid ensemble pruning algorithm based on k-means clustering and dynamic selection.
In another study, Mousavi and Eftekhari (2015) proposed a hybrid ensemble pruning algorithm which combines static
and dynamic ensemble strategies with NSGA-II multi-objective genetic algorithm. Cavalcanti, Oliveira, Moura, and Car-
valho (2016) presented a hybrid ensemble pruning algorithm based on multiple diversity measures (such as Q-statistics,
correlation coefficient, Kappa statistics and double-fault measure), genetic algorithm and a graph coloring algorithm.
Machine learning methods have been extensively employed for sentiment analysis due to their predictive performance.
The performance of ensemble learning on sentiment analysis has been examined in the literature.
Recent studies on sentiment analysis indicate that ensemble learning can enhance the predictive performance (Prabowo
& Thelwall, 2009; Wang et al., 2014; Xia et al., 2011). For instance, Prabowo and Thelwall (2009) examined the use of
ensemble learning in sentiment classification by combining the general inquirer based classifier, rule-based classifier,
statistics based classifier, rule-based classifier and support vector machines in a number of different ways. In another study,
Xia et al. (2011) combined different feature sets (such as part of speech information and world relation features) and
classification algorithms (such as Naïve Bayes, maximum entropy and support vector machines) to examine their predictive
performance. del Pilar Salas-Zarate et al. (2014) examined the effectiveness of the psychological and linguistic features
in the classification of Spanish opinions. Fersini et al. (2014) introduced a Bayesian model averaging based approach for
sentiment classification. In this scheme, the classifiers to be included in the ensemble were selected based on a heuristic
818 A. Onan et al. / Information Processing and Management 53 (2017) 814–833
combination strategy. In another study, the predictive performance of three ensemble learning algorithms (Bagging, Random
Subspace and Boosting) in conjunction with five classification algorithms (Naïve Bayes, maximum entropy, decision tree,
K-nearest neighbor and support vector machines) were empirically evaluated (Wang et al., 2014). The empirical analysis
indicated that the Random Subspace method obtains promising results on sentiment analysis. In another study, Elghazel,
Aussem, Gharroudi, and Saadaoui (2016) presented an ensemble scheme for text classification based on rotation forest
and latent semantic indexing. Similarly, Onan, Korukoğlu, and Bulut (2016) examined the predictive performance of five
statistical keyword extraction methods in conjunction with ensemble learning methods for text classification.
Sentiment analysis on Twitter data has attracted much research attention. For instance, Da Silva, Hruschka, and Hruschka
(2014) examined the performance of several representation schemes (such as bag-of-words and feature hashing) and
different classification schemes obtained by the combination of lexicons, bag-of-words, emoticons and feature hashing on
Twitter data. The experimental analysis indicated that ensemble of Naïve Bayes, support vector machines, Random Forest
and logistic regression yields promising results. In another study, Khan, Bashir, and Qamar (2014) presented an ensemble
classification scheme for sentiment analysis of Twitter data, where the classifiers were applied in a pipelined manner.
Similarly, Saif, He, Fernandez, and Alani (2016) presented a lexicon based approach for sentiment analysis on Twitter data.
In this scheme, the co-occurrence patterns of words were also considered to capture their semantics.
Wang, Zhang, Sun, Yang, and Larson (2015) introduced an ensemble classification scheme for sentiment analysis based
on Random Subspace and part of speech analysis. Fusilier, Montes-y-Gomez, Rosso, and Cabrera (2015) presented a
semi-supervised classification scheme for detection of positive and negative deceptive opinions. Sun, Wang, Cheng, and Fu
(2015) presented an ensemble scheme based on inferred sentiment feedback information and one class collaborative filter-
ing for improving the social media item recommendation. In another study, Xia, Xu, Yu, Qi, and Cambria (2016) proposed
a three-staged, ensemble classification scheme for document-level sentiment classification. In this scheme, a rule-based
method was employed to detect explicit negations and contrasts, and a statistical method was employed to detect implicit
inconsistencies. In another study, Fersini, Messina, & Pozzi (2016) examined the performance of expressive signals (such
as adjectives, emoticon, emphatic and onomatopoeic expressions and expressive lengthening) in sentiment analysis in
conjunction with classification algorithms and ensemble classification schemes. The experimental results indicated that ad-
jectives are more discriminative expressive signals. In another work, Liu et al. (2016) presented a multi-swarm optimization
based feature selection algorithm to identify the discriminative features in text sentiment classification. Appel, Chiclana,
Carter, and Fujita (2016) proposed a hybrid approach for sentence-level sentiment analysis, which is based on sentiment
lexicon and fuzzy sets. Yoon, Kim, Kim, and Song (2016) presented a hybrid classification scheme for Twitter data based on
regression and the latent Dirichlet allocation topic modeling.
3. Research objectives
The aim of the study is to examine the performance of ensemble pruning algorithms in text sentiment classification and
to develop an efficient ensemble classification scheme for sentiment analysis. The research objectives of this study are as
follows:
1. Investigate the performance of conventional ensemble pruning schemes on text sentiment classification: In order to
examine the performance of conventional ensemble pruning algorithms, three ensemble pruning algorithms, namely,
ensemble selection from libraries of models, Bagging ensemble selection and LibD3C algorithm, are considered.
2. Examine the performance of different clustering algorithms and their combination via consensus clustering in en-
semble pruning: For this research objective, we have analyzed the performance of K-means, K-means++, expectation
maximization, self-organizing maps and their subsets.
3. Determine an appropriate rule for selecting candidate classifiers from the clusters: We have analyzed the following 7
rules; selecting one classifier at random from each cluster, selecting two classifiers at random from each cluster, selecting
two classifiers with the highest predictive performance from each cluster, selecting classifiers from clusters proportional
to their predictive performance, selecting based on Q-statistics, disagreement measure and double-fault measure.
4. Examine the performance of search algorithms in exploring the search space of candidate classifiers: In this regard, best
first search, greedy search algorithm, genetic algorithm, particle swarm optimization, differential evolution and ENORA
algorithm are evaluated.
5. Design an effective hybrid ensemble pruning scheme, which integrates clustering and randomized search: The different
combinations of clustering algorithms, selection rules and search algorithms are evaluated. The highest predictive
performance is obtained with the combination of consensus clustering with the elitist pareto-based multi-objective
evolutionary algorithm and Q-statistics based selection.
4. Theoretical foundations
This section presents the ensemble pruning methods utilized in the empirical analysis, the paradigm of consensus
clustering and clustering algorithms and metaheuristic algorithms utilized in the proposed ensemble pruning.
A. Onan et al. / Information Processing and Management 53 (2017) 814–833 819
In order to assess the performance of the proposed ensemble pruning approach, we have utilized three ensemble
pruning methods, namely ensemble pruning from libraries of models (Caruana et al., 2004), Bagging ensemble selection
(Sun and Pfahringer, 2011) and LibD3C algorithm (Lin et al., 2014). The ensemble pruning from libraries of models is
selected in the empirical comparisons, since it is one of the most well-known, standard ensemble pruning methods in
the literature (Mendes-Moreira et al., 2012; Rokach, 2010). Bagging ensemble selection is a recent approach based on the
ensemble pruning from libraries of models. LibD3C algorithm is a hybrid ensemble pruning algorithm based on clustering
and dynamic selection strategy. Therefore, these algorithms are utilized in the empirical analysis.
Clustering (also known as cluster analysis) is an unsupervised learning, which assigns data objects into groups such that
the objects within the same cluster are similar to each other as much as possible, whereas the objects of different clusters
are different from the each other (Tan, Steinbach, & Kumar, 2005). Cluster ensembles aim to integrate base algorithms to
obtain a more robust final clustering. Consensus clustering is the process of aggregation of multiple clustering results into a
single consolidated clustering so that a more robust and stable clustering result can be obtained (Ghosh & Acharya, 2011).
With the use of consensus clustering, the quality of the final clustering result can be enhanced. The drawbacks of base
clustering algorithms may be eliminated in the final consolidated clustering. The dependency of clustering scheme to noisy
data points and outliers can be reduced (Ghosh & Acharya, 2011; Vega-Pons & Ruiz-Shulcloper, 2011).
Consensus clustering consists of two main phases, as cluster generation and consensus function. In cluster generation
phase, the diversity is provided by using different clustering algorithms, by using different seeds or parameter values for the
algorithms and by working on different attribute sets. Then, consensus function is utilized to aggregate the multiple cluster-
ing results to obtain the final cluster (Ghaemi, Sulaiman, Ibrahim, & Mustapha, 2009). One of the critical issues in consensus
clustering is the identification of the consensus function to be utilized. The consensus function should effectively aggregate
the results of base clustering algorithms. The consensus functions are grouped into two classes; median partition based
820 A. Onan et al. / Information Processing and Management 53 (2017) 814–833
methods and object co-occurrence based methods (Vega-Pons & Ruiz-Shulcloper, 2011). In median partition based methods,
the problem of obtaining the final consolidated clustering is regarded as a median partitioning problem. In this regard, it is
aimed to obtain a final clustering, which maximizes its similarity to the individual clustering results obtained by base clus-
tering algorithms. In order to measure the similarity between the clustering results, an evaluation measure, such as normal-
ized mutual information, utility function, Fowlkes–Mallows index and purity, is utilized (Fowlkes & Mallows, 1983; Mirkin,
2001; Strehl & Ghosh, 2002). In object co-occurrence based methods, the number of occurrences of a particular object within
the same cluster of different clustering results and the number of co-occurrence of two objects within the same cluster can
be taken into consideration. Then, the consensus partition is obtained through a voting process. Relabeling and voting based,
co-association matrix based and graph based methods are representatives of object co-occurrence based methods (Vega-
Pons & Ruiz-Shulcloper, 2011). By relabeling and voting based methods, the corresponding cluster labels among different
partitions are obtained from the labels of base clustering algorithms. Then, the consensus partition is obtained by a voting
scheme. In co-association matrix based methods, a co-association matrix is constructed from different partitions obtained
from the base clustering algorithms. Then, a similarity based clustering algorithm is employed in the co-association matrix to
obtain the final partition. In graph based methods, a weighted graph is utilized to represent multiple clustering results and
the final partition is obtained from the graph structure by minimizing the graph cut (Vega-Pons & Ruiz-Shulcloper, 2011).
In order to construct cluster ensembles, the diversity can be provided with the utilization of different clustering algo-
rithms as the base models. The rest of this section briefly describes the clustering algorithms utilized in the development
of proposed hybrid ensemble pruning approach.
The identification of an optimal subset of classifiers from the classifier ensemble requires an exhaustive search of
possible subsets, which may be computationally infeasible task. The process of searching for an optimal subset of classifiers
can be handled by metaheuristic algorithms (Sheen, Aishwarya, Anitha, Raghavan, & Bhaskar, 2012). The rest of this section
briefly describes the heuristic and metaheuristic search algorithms utilized in the empirical analysis.
A. Onan et al. / Information Processing and Management 53 (2017) 814–833 821
then they are examined based on their crowding distance and an individual with higher crowding distance is regarded as
the better individual.
822 A. Onan et al. / Information Processing and Management 53 (2017) 814–833
5. Proposed ensemble pruning approach based on consensus clustering and ENORA algorithm
The proposed ensemble pruning approach is a hybrid algorithm, which integrates randomized search and clustering based
ensemble pruning paradigms. The general architecture of the proposed ensemble pruning approach is outlined in Fig. 4.
As depicted in Fig. 4, the main steps of the proposed ensemble pruning approach involve a nine-staged procedure. The
ensemble pruning approach starts with training each of the base learning algorithms (classification algorithms) within the
model library (as denoted by Stage-1 of Fig. 4). The model library is obtained by taking different classification algorithms
with various parameter values. Bayesian classifiers (such as Bayesian logistic regression, Naïve Bayes), function based
classifiers (such as FLDA, Kernel Logistic Regression, support vector machines, multi-layer perceptron, radial basis function
networks), instance based classifiers (such as k-nearest neighbor algorithm), rule based classifiers (such as FURIA and
RIPPER) and several decision tree classifiers (such as BFTree, functional tree, C4.5, NBTree, Random Forest and Random
Tree) are utilized to obtain the base learning algorithms of model library. These classification algorithms are taken into
consideration with different parameter values. The details of the model library algorithms are outlined in Table 2. Each of
the base learning algorithms of model library is trained on the training set and the predictive performance of each learning
A. Onan et al. / Information Processing and Management 53 (2017) 814–833 823
algorithm is obtained. Hence, the predictive performance of each classifier in terms of classification accuracy is obtained at
the end of Stage-1 of Fig. 4.
Then, the classifier-instance matrix is constructed based on the predictive performance of classifiers (as denoted by
Stage-2 of Fig. 4). Let N denote the number of instances within the training set and let M denote the number of base
learning algorithms within the model library, an M × N matrix (referred as classifier-instance matrix) is constructed based
on the predictive performance of learning algorithms on the training set. In the classifier-instance matrix, a cell of the
matrix contains the value of zero if the instance is incorrectly classified by a particular classifier and it contains the value
of one, otherwise. By constructing a classifier-instance matrix in this manner, a matrix, which summarizes the classification
characteristics of the learning algorithms, is obtained.
In Stage-3 of Fig. 4, clustering algorithms are employed on the classifier-instance matrix to group classifiers (learning
algorithms) into clusters, such that the classifiers with similar classification characteristics are assigned into the same
clusters. In this stage, several different clustering algorithms are applied on the classifier-instance matrix to eliminate the
instability of clusters.
In Stage-4 of Fig. 4, a co-association matrix is constructed from the multiple clustering results. In this scheme, a
co-association based consensus function is utilized. Each cell of the matrix is computed using Eq. (3) (Vega-Pons &
Ruiz-Shulcloper, 2011):
1
m
C Ai j = δ (Pt (xi ), Pt (x j )) (3)
m
t=1
where Pt (xi ) denotes the cluster label of an object xi in Pt clustering. If two objects xi, and xj have the same cluster label
in PT clustering, then δ (a, b) = 1 . Otherwise, δ (a, b) = 0 . The co-association matrix summarizes the number of particular
objects co-occurred in the same cluster with different clustering results.
In Stage-5 of Fig. 4, a single-link clustering algorithm is applied to the co-association matrix to obtain the final partition
(Fred and Jain, 2005). In this way, the final consolidated clustering result is obtained.
After obtaining the final partition from the cluster ensemble, the candidate classifiers are selected from each cluster.
Based on the result of the final consolidated clustering (as denoted by Stage-7 of Fig. 4), two classifiers are selected from
each cluster to obtain the set of candidate classifiers. The identification of classifiers from each classifier is based on the
diversity among the members of each cluster.
As noted before, one of the most critical issues in designing effective ensemble of classifiers is to provide diversity
(Kuncheva and Whitaker, 2003). In this regard, the diversity of the ensemble can be provided by selecting the classifiers from
different clusters. In the proposed scheme, classifiers selected from clusters are regarded as candidate classifiers that may
be included in the final ensemble. The determination of classifiers to be included in the final ensemble requires enumerat-
ing 2n possible subsets of classifiers for a candidate set of n classifiers. Hence, a metaheuristic search algorithm is utilized
in this stage. As denoted by Stage-8 of Fig. 4, the search space of candidate classifiers is explored with ENORA algorithm.
Finally, ENORA algorithm returns an optimal subset of classifiers as the pruned ensemble as indicated by Stage-9 of Fig. 4.
Based on the general structure and stages outlined in Fig. 4, an extensive empirical analysis has been carried on text
classification benchmarks to obtain an efficient ensemble scheme. In order to obtain an efficient classifier ensemble by the
proposed scheme, the algorithm to be utilized in the clustering stage, the evaluation criterion or rule to be used in selecting
classifiers from each cluster and the metaheuristic algorithm to be utilized in exploring the search space are critical design
824 A. Onan et al. / Information Processing and Management 53 (2017) 814–833
issues. Regarding the algorithm to be utilized in the clustering stage, we have employed K-means, K-means++, expectation
maximization and self-organizing map algorithm as the base clustering algorithms. As noted in advance, the clustering re-
sults of different clustering algorithms may vary greatly and the performance of cluster-based ensemble pruning algorithms
suffers from instability of clustering. Motivated by this deficiency, we propose a consensus clustering based approach in the
clustering phase. In the empirical analysis, the mentioned clustering algorithms and their possible subsets (24 −1 cases in
total) are taken into consideration. Based on the empirical analysis, the highest predictive performance is obtained by the
consensus clustering that consists of self-organizing map algorithm (SOM), expectation maximization (EM) and K-means++
(KM++).
In order to select classifiers from the clusters, seven different evaluation rules are utilized. The first rule is the selection
of one classifier from each cluster at random, denoted by RAN1. The second rule is the selection of two classifiers from
each cluster at random, denoted by RAN2. The third rule is the selection of the two classifiers with the highest predictive
performance from each cluster, denoted by BEST. The fourth rule is the selection of classifiers from clusters proportional
to their predictive performance. Hence, a classifier with higher predictive performance, denoted as WS, has more chance of
being selected. The other selection rules are based on pairwise classifier diversity measures, namely Q-statistics, disagree-
ment measure and double-fault measure. For each cluster, the pairwise disagreement among members of the cluster is
computed. Then, two members of each cluster with the highest diversity in terms of measure value are selected.
Q-statistics, the disagreement measure (Dis) and the double-fault measure (DF) among two classifiers Di and Dk is
computed using Eqs. (4), (5) and (6), respectively (Kuncheva and Whitaker, 2003):
N 11 N 00 − N 01 N 10
Qi,k = (4)
N 11 N 00 + N 01 N 10
N 01 + N 10
Disi,k = (5)
N 11 + N 10 + N 01 + N 00
N 00
DFi,k = (6)
N 11 + N 10+ N 01 + N 00
where N11 , N00 , N10 and N01 denote the number of correctly classified instances by the two classifiers, the number of
incorrectly classified instances by the two classifiers, the number of instances correctly classified by Di and incorrectly
classified by Dk and the number of instances correctly classified by Dk and incorrectly classified by Di , respectively.
Based on the empirical analysis of different evaluation rules, the highest predictive performance is obtained from the
Q-statistics based classifier selection from the clusters. Hence, we have utilized this measure in the proposed approach.
With the selection of two members (classifiers) from each cluster based on Q-statistics, a set of candidate classifiers
is obtained. The search space of candidate classifiers can be explored by a heuristic search algorithm. In this stage, six
different search algorithms (best first search, greedy search algorithm, particle swarm optimization, genetic algorithms,
differential evolution algorithm and ENORA algorithm) are considered. Since ENORA algorithm obtains the highest predictive
performance in terms of accuracy, it is utilized as the search algorithm. In ENORA algorithm, the majority voting error
and averaged pairwise measure (Q-statistics) act as the selection criteria. The proposed ensemble pruning approach is
summarized in Fig. 5.
A. Onan et al. / Information Processing and Management 53 (2017) 814–833 825
Table 1
Descriptive information for the datasets (Wang et al., 2014; Whitehead & Yager, 2009).
Dataset Positive class label Negative class label Neutral class label Objective class label
6. Experimental analysis
In order to examine the predictive performance of the proposed ensemble pruning approach on text classification, an
extensive empirical analysis is performed. This section presents the datasets utilized in the analysis, the experimental
procedure and the experimental results.
6.1. Datasets
Our experimental analysis was conducted on twelve public sentiment analysis datasets from several domains. The
datasets include balanced and unbalanced benchmarks. The balanced datasets are Camera, Camp, Doctor, Drug, Laptop,
Lawyer, Radio, TV and Music datasets (Whitehead & Yager, 2009). Camera dataset contains digital camera evaluations ex-
tracted from Amazon.com. Camp dataset contains evaluations for summer camps extracted from CampRatingz.com. Doctor
Dataset contains doctor evaluations retrieved from RateMDs.com. The drug data set is obtained from DrugrRantingz.com,
Laptop and Music datasets are also obtained from Amazon.com. The lawyer data set is obtained from LawyerRatingz.com.
The radio data set is obtained from RadioRatingz.com. Finally, the TV data set is obtained from TVRatingz.com. In order
to examine the predictive performance on unbalanced datasets, hotel reviews dataset (Zhang, Ma, Yi, Niu, & Xu, 2015),
Twitter person dataset and Twitter movie dataset (Chen, Wang, Nagarajan, Wang, & Sheth, 2012) are utilized. The nine
balanced sentiment analysis datasets generally have the same number of instances with positive and negative class labels.
The basic descriptive information about the datasets utilized in the experimental analysis is presented in Table 1 (Onan &
Korukoğlu, 2015). All sentiment classification datasets excluding Twitter person and movie datasets include two class labels,
as positive and negative, while Twitter person dataset and Twitter movie datasets include four class labels, as positive,
negative, neutral and objective.
To evaluate the predictive of ensemble pruning algorithms, classification accuracy (ACC) is utilized as the evaluation
measure. It is one of the most widely utilized measures in performance evaluation of classification algorithms. It is the
proportion of the number of true positives and true negatives obtained by the classifiers in the total number of instances
as given by Eq. (7):
TN + TP
ACC = (7)
TP + FP + FN + TN
where TN, TP, FP and FN represents the number of true negatives, true positives, false positives and false negatives,
respectively.
In the experiments, 10-fold cross-validation method is utilized. In this scheme, the original dataset is randomly par-
titioned into 10 equal sized partitions. Each time, one of the partitions is used for validation and the remainder of the
partitions is utilized for training. The process is repeated ten times and the average results across all trials are reported.
The classification algorithms and the compared ensemble pruning algorithms (Ensemble selection from libraries of models
(ESM), Bagging ensemble selection (BES) and LibD3C algorithm) are implemented with WEKA (Waikato Environment for
Knowledge Analysis) version 3.7.11, which is an open-source platform that contains machine learning algorithms imple-
mented in Java (Hall et al., 2009; “Weka 3: Data Mining Software in Java”, 2016). The proposed ensemble pruning scheme
is implemented in Java. Table 2 presents the classification algorithms (base learning algorithms) which are utilized to build
the model library. For ESM and LibD3C algorithms, the same model library presented in Table 2 is utilized. For Bagging
826 A. Onan et al. / Information Processing and Management 53 (2017) 814–833
Table 2
Classification algorithms used to build the model library.
Bayesian classifiers (5) Bayesian logistic regression (with norm-based hyper-parameter selection), Bayesian logistic regression
(withcross-validated hyper-parameter selection), Bayesian logistic regression (with specific value based
hyper-parameter selection), Naive Bayes, Naive Bayes multinomial
Function based classifiers (14) FLDA, Kernel Logistic Regression (with Poly Kernel), Kernel Logistic Regression (with Normalized Poly Kernel),
LibLINEAR (with L2-regularized logistic regression), LibLINEAR (with L2-regularized L2-loss support vector
classification), LibLINEAR (with L1-regularized logistic regression), LibSVM (with radial basis function),
LibSVM (with linear Kernel), LibSVM (with polynomial Kernel), LibSVM (with sigmoid Kernel), Multi-layer
perceptron, radial basis function networks, Logistic regression, Gaussian radial basis function networks
Instance based classifiers (10) KNN (with K: 1), KNN (with K:2), KNN (with K:3), KNN (with K: 4), KNN (with K:5), KNN (with K:6), KNN
(with K:7), KNN (with K:8), KNN (with K:9), KNN (with K:10)
Rule based classifiers (3) FURIA (with product T-norm), FURIA (with minimum T-norm), RIPPER
Decision tree classifiers (8) BFTree (Unpruned), BFTree (post-pruning), BFTree (pre-pruning), Functional Tree, C4.5 (J48), NBTree, Random
Forest, Random Tree
Table 3
Parameters of the search algorithms.
Best first search Direction of the search: forward, the maximum size of the lookup cache of evaluated subsets: 1, the number
of consecutive non-improving nodes to allow: 5
Greedy search algorithm The number of execution slots: 1, attributes to be retained: all
Particle swarm optimization Individual weight: 0.34, Inertia weight: 0.33, Number of iterations: 20, Mutation probability: 0.01, Mutation
type: bit-flip, Population size: 20, Seed: 1, Social Weight: 0.33
Genetic algorithms Crossover probability: 0.6, The number of generations to evaluate: 20, Mutation probability: 0.033, Population
size: 20, Seed: 1
Differential evolution algorithm Scaling factor (β ) = 0. 5, population size = 100, number of generations = 50
ENORA algorithm The number of generations to evolve the population: 10, Population size: 100, Seed: 1
ensemble selection, a tree-based BES implementation is utilized, which is available (“Bagging Ensemble Selection - a new
ensemble learning strategy”, 2016). In the empirical analysis, ESM, BES and LibD3C algorithms are evaluated with different
parameter values. For ESM algorithm, forward selection, backward elimination, forward-backward selection and the best
model schemes are evaluated to optimize the ensemble. In addition, root mean squared error (RMSE), accuracy (ACC), ROC
area, precision, recall and F-measure are utilized as the metric to be used to optimize the classifier ensemble. In this way,
24 different configurations are evaluated for ESM algorithm. For BES algorithm, different sizes of bags ranging from 10 to
100 are examined. In BES algorithm, root mean squared error (RMSE), accuracy (ACC), ROC area, precision, recall, F-measure
and the combination of all metrics are utilized as the evaluation measures. In this way, eighty different configurations are
considered for BES algorithm. For LibD3C algorithm, five different ensemble combination rules (average of probabilities,
product of probabilities, majority voting, minimum probability and maximum probability) are considered. The results given
in the experimental analysis present the highest predictive performance obtained by these algorithms.
As noted before, the proposed ensemble pruning algorithm combines clustering algorithms with randomized search
based approaches. To examine the performance of different clustering algorithms, K-means, K-means++, expectation
maximization and self-organizing map algorithm are utilized as the base clustering algorithms and their possible subsets
(24 –1 cases in total) are examined. To select the classifiers from each cluster, seven different evaluation rules (RAN1, RAN2,
BEST, WS, Q-statistics, disagreement measure and double-fault measure) presented above are applied. To explore the search
space, best first search, greedy search algorithm, particle swarm optimization, genetic algorithm, differential evolution and
ENORA algorithms are utilized. The parameters of search algorithms are given in Table 3.
In addition, the performance of the proposed ensemble pruning scheme is also compared with the conventional ensem-
ble learning methods, such as AdaBoost, Bagging and Random Subspace). For these methods, the Naïve Bayes algorithm is
utilized as the base learning algorithm.
6.4. Results
In the extensive empirical analysis of the proposed model, different clustering algorithms (K-means, K-means++,
expectation maximization, self-organizing maps and their subsets) are evaluated to form a cluster ensemble. In addition,
7 different rules (RAN1, RAN2, BEST, WS, Q-statistics, Dis and DF) are evaluated to select the candidate classifiers from
each cluster. Besides, the performance of 6 search algorithms (best first search, greedy search algorithm, genetic algorithm,
particle swarm optimization, differential evolution and ENORA algorithm) is presented. In the tables, the best (the highest)
results obtained by a particular algorithm are indicated as both boldface and underline and the second best results obtained
by a particular algorithm are indicated as both boldface and italics.
The first design issue of the study is to determine the rule to be employed for selecting candidate classifiers from each
cluster. In Table 4, average classification accuracies obtained with different selection rules are presented. Standard errors
A. Onan et al. / Information Processing and Management 53 (2017) 814–833 827
Table 4
Classification accuracies obtained with different selection rules.
Camera 92,907 ± 0,0867 91,812 ± 0,0630 91,267 ± 0,0671 90,705 ± 0,0668 90,154 ± 0,0678 89,578 ± 0,0976 88,196 ± 0105
Camp 94,132 ± 0,0875 92,944 ± 0,0649 92,369 ± 0,0669 91,790 ± 0,0675 91,185 ± 0,0692 90,528 ± 0100 89,326 ± 0,0976
Doctor 92,991 ± 0,0894 91,768 ± 0,0643 91,194 ± 0,0664 90,654 ± 0,0674 90,108 ± 0,0673 89,578 ± 0,0976 88,312 ± 0106
Drug 91,941 ± 0,0847 90,846 ± 0,0639 90,254 ± 0,0664 89,708 ± 0,0669 89,201 ± 0,0670 88,683 ± 0,0970 87,368 ± 0102
Laptop 96,341 ± 0,0826 95,490 ± 0,0594 95,114 ± 0,0661 94,746 ± 0,0660 94,343 ± 0,0672 93,932 ± 0,0954 93,086 ± 0,0852
Lawyer 94,617 ± 0,0884 93,487 ± 0,0634 92,921 ± 0,0658 92,411 ± 0,0661 91,908 ± 0,0682 91,313 ± 0,0974 89,909 ± 0105
Music 92,169 ± 0,0809 91,055 ± 0,0674 90,448 ± 0,0696 89,716 ± 0,0697 89,0 0 0 ± 0,0708 88,281 ± 0107 86,644 ± 0120
Radio 91,098 ± 0,0869 89,922 ± 0,0628 89,376 ± 0,0666 88,853 ± 0,0664 88,335 ± 0,0671 87,795 ± 0,0961 86,611 ± 0,0892
TV 94,042 ± 0,0772 93,118 ± 0,0623 92,591 ± 0,0659 92,093 ± 0,0663 91,599 ± 0,0678 91,032 ± 0,0945 89,892 ± 0113
Hotel reviews 87,743 ± 0936 85,734 ± 0697 85,359 ± 0775 84,255 ± 0533 83,317 ± 0637 84,438 ± 0659 82,718 ± 0,57
Twitter person 73,595 ± 0677 72,425 ± 0,73 71,740 ± 0544 71,378 ± 0778 71,084 ± 0519 70,921 ± 0418 71,016 ± 0817
Twitter movie 69,380 ± 0,6 68,637 ± 0666 64,077 ± 0919 62,421 ± 0867 60,765 ± 0612 60,708 ± 0632 60,176 ± 0404
Table 5
Classification results obtained with different search algorithms.
Camera 91,124 ± 0161 90,851 ± 0145 90,746 ± 0148 90,595 ± 0150 90,431 ± 0156 90,212 ± 0174
Camp 92,225 ± 0169 91,945 ± 0152 91,835 ± 0153 91,681 ± 0153 91,521 ± 0158 91,313 ± 0173
Doctor 91,134 ± 0164 90,857 ± 0146 90,739 ± 0146 90,589 ± 0147 90,427 ± 0152 90,202 ± 0172
Drug 90,165 ± 0158 89,909 ± 0143 89,802 ± 0144 89,646 ± 0146 89,486 ± 0151 89,280 ± 0168
Laptop 95,071 ± 0131 94,851 ± 0109 94,772 ± 0111 94,673 ± 0112 94,554 ± 0115 94,409 ± 0125
Lawyer 92,831 ± 0160 92,571 ± 0144 92,453 ± 0147 92,298 ± 0149 92,131 ± 0154 91,916 ± 0173
Music 90,137 ± 0177 89,848 ± 0168 89,731 ± 0171 89,543 ± 0176 89,353 ± 0185 89,085 ± 0210
Radio 89,296 ± 0158 89,034 ± 0144 88,925 ± 0144 88,774 ± 0145 88,634 ± 0147 88,472 ± 0154
TV 92,431 ± 0143 92,220 ± 0131 92,140 ± 0135 92,013 ± 0136 91,877 ± 0139 91,633 ± 0168
Hotel reviews 87,351 ± 0,8 85,642 ± 0809 84,562 ± 0576 84,707 ± 0649 84,704 ± 0552 84,722 ± 0507
Twitter person 72,829 ± 0,72 72,517 ± 0552 72,380 ± 0596 72,374 ± 0319 71,315 ± 0834 70,232 ± 0822
Twitter movie 66,736 ± 0432 64,294 ± 0616 61,827 ± 0539 61,356 ± 0686 60,659 ± 0919 60,318 ± 0537
of means are also given after accuracy results. As it can be observed from the classification accuracies presented in Table
4, the highest predictive performance is obtained with the use of Q-statistics based selection of candidate classifiers from
each cluster. The second highest predictive performance is obtained with the disagreement measure (Dis) based selection
and the worst predictive performance is obtained with the selection of one classifier from each cluster at random (RAN1).
The second design issue of the study is to determine the search algorithm to be utilized to explore the search space of
candidate classifiers. In Table 5, average classification accuracies obtained with different search algorithms are presented. As
it can be observed from the classification accuracies presented in Table 5, the highest predictive performance is obtained
when ENORA algorithm is utilized to explore the search space of candidate classifiers. The second highest predictive
performance is obtained with the differential evolution algorithm and the worst predictive performance is obtained with
the greedy search algorithm.
The third design issue of the study is to determine the clustering algorithm utilized to cluster classifiers into groups
based on their similarity in terms of performance. In this regard, 15 different configurations listed in Table 6 are considered.
The highest predictive performance among the compared clustering algorithms is obtained with the use of a cluster
ensemble of self-organizing maps, expectation maximization and K-means++. The second highest predictive performance
is obtained with a cluster ensemble of K-means, expectation maximization and K-means++ algorithm. The consensus
clustering based approaches generally yield higher predictive performance in terms of classification accuracy owing to
generating more stable/robust clusters.
Based on the extensive empirical analysis, the highest predictive performance is obtained with the use of consensus
clustering (SOM+EM+KM++) in the clustering stage, Q-statistics based selection rule to determine candidate classifiers
from each cluster and ENORA algorithm to explore the search space of candidate classifiers. In Table 7, the performance
of the proposed ensemble pruning approach based on consensus clustering and ENORA algorithm is compared with
three state of the art ensemble learning methods (AdaBoost, Bagging and Random Subspace) and three ensemble pruning
methods (Ensemble selection from libraries of models, Bagging ensemble selection, and LibD3C algorithm). Regarding
the performance of conventional ensemble learning methods, the highest performance on Camera, Drug, Lawyer, Music
and Radio datasets is obtained with Bagging method. The highest predictive performance on Camp, Doctor and Laptop
datasets is obtained with AdaBoost algorithm and the highest predictive performance on TV dataset is obtained with the
Random Subspace algorithm. Regarding the performance of conventional ensemble pruning algorithms, LibD3C algorithm
generally outperforms ensemble selection from libraries of models and Bagging ensemble selection. As it can be observed
from the results presented in Table 7, the proposed ensemble pruning scheme based on consensus clustering and ENORA
algorithm yields the highest predictive performance compared to the conventional ensemble learning and ensemble pruning
828
A. Onan et al. / Information Processing and Management 53 (2017) 814–833
Table 6
Classification results obtained with different clustering algorithms.
Clustering algorithm Camera Camp Doctor Drug Laptop Lawyer Music Radio TV Hotel reviews Twitter person Twitter movie
KM 90,262 ± 0244 91,359 ± 0249 90,251 ± 0242 89,327 ± 0239 94,357 ± 0170 91,972 ± 0240 89,200 ± 0294 88,477 ± 0223 91,613 ± 0251 83,382 ± 0682 70,854 ± 0811 59,746 ± 0538
KM++ 90,283 ± 0240 91,389 ± 0242 90,279 ± 0238 89,335 ± 0233 94,353 ± 0164 91,993 ± 0238 89,247 ± 0288 88,492 ± 0222 91,652 ± 0236 83,830 ± 0,84 70,934 ± 0554 59,876 ± 0756
EM 90,298 ± 0236 91,403 ± 0241 90,297 ± 0236 89,358 ± 0227 94,365 ± 0162 92,015 ± 0232 89,268 ± 0281 88,519 ± 0219 91,705 ± 0211 83,764 ± 0834 71,019 ± 0,56 60,122 ± 0634
SOM 90,311 ± 0235 91,405 ± 0243 90,312 ± 0232 89,377 ± 0226 94,382 ± 0162 92,019 ± 0234 89,264 ± 0285 88,511 ± 0223 91,726 ± 0206 83,804 ± 0635 71,622 ± 0835 60,228 ± 0469
SOM+EM 90,327 ± 0233 91,425 ± 0239 90,333 ± 0230 89,385 ± 0225 94,396 ± 0160 92,026 ± 0234 89,290 ± 0279 88,528 ± 0222 91,733 ± 0205 83,436 ± 0806 72,156 ± 0644 60,342 ± 0505
SOM+ KM++ 90,340 ± 0232 91,427 ± 0238 90,334 ± 0229 89,402 ± 0224 94,396 ± 0164 92,047 ± 0231 89,298 ± 0279 88,532 ± 0221 91,748 ± 0207 84,795 ± 0,79 72,272 ± 0,75 60,566 ± 0818
EM+KM++ 90,390 ± 0230 91,483 ± 0239 90,399 ± 0231 89,434 ± 0223 94,430 ± 0163 92,098 ± 0229 89,355 ± 0273 88,587 ± 0223 91,781 ± 0204 84,340 ± 0518 71,070 ± 0342 60,593 ± 0413
KM+EM 90,378 ± 0230 91,466 ± 0242 90,377 ± 0231 89,444 ± 0228 94,403 ± 0162 92,093 ± 0230 89,338 ± 0277 88,572 ± 0225 91,764 ± 0204 85,036 ± 0538 71,548 ± 0445 60,540 ± 0654
KM+KM++ 90,369 ± 0231 91,458 ± 0238 90,371 ± 0228 89,411 ± 0222 94,433 ± 0156 92,076 ± 0230 89,314 ± 0276 88,572 ± 0217 91,751 ± 0202 84,226 ± 0741 71,683 ± 0866 60,703 ± 0806
SOM+KM 90,352 ± 0231 91,444 ± 0240 90,350 ± 0227 89,409 ± 0226 94,443 ± 0161 92,054 ± 0231 89,292 ± 0276 88,540 ± 0219 91,758 ± 0202 84,285 ± 0553 71,760 ± 0711 60,833 ± 0601
SOM+EM+KM++ 91,949 ± 0231 93,029 ± 0235 91,949 ± 0226 90,985 ± 0219 96,0 0 0 ± 0165 93,661 ± 0233 90,903 ± 0266 90,123 ± 0217 93,339 ± 0203 87,635 ± 0489 73,131 ± 0606 63,700 ± 0415
KM+EM+ KM++ 90,402 ± 0230 91,496 ± 0237 90,386 ± 0227 89,459 ± 0226 94,456 ± 0164 92,110 ± 0228 89,356 ± 0277 88,584 ± 0218 91,808 ± 0202 85,740 ± 0672 72,166 ± 0393 61,064 ± 0784
SOM+KM+EM 90,413 ± 0230 91,507 ± 0237 90,412 ± 0230 89,460 ± 0222 94,470 ± 0164 92,106 ± 0225 89,369 ± 0271 88,598 ± 0217 91,807 ± 0199 84,861 ± 0654 72,709 ± 0477 61,407 ± 0469
SOM+KM+KM++ 91,921 ± 0226 93,013 ± 0234 91,921 ± 0225 90,972 ± 0219 95,989 ± 0164 93,622 ± 0221 90,884 ± 0267 90,108 ± 0216 93,304 ± 0194 87,196 ± 0596 72,913 ± 0539 62,049 ± 0553
SOM+EM+KM+KM++ 91,905 ± 0225 92,997 ± 0233 91,899 ± 0223 90,959 ± 0218 95,951 ± 0157 93,608 ± 0221 90,863 ± 0268 90,093 ± 0215 93,292 ± 0195 86,425 ± 0629 72,747 ± 0857 62,016 ± 0543
A. Onan et al. / Information Processing and Management 53 (2017) 814–833 829
Table 7
Comparison of the proposed model with the state of the art ensemble learning techniques and ensemble pruning schemes.
Datasets AdaBoost Bagging Random Subspace ESM BES LibD3C Our model
Table 8
Two-way ANOVA test results.
Source DF SS MS F P
algorithms. In Fig. 6, the predictive performance of compared algorithms on different datasets is presented. Fig. 6 indicates
that the proposed model yields the highest predictive performance on all datasets examined.
The empirical analysis is conducted on nine balanced datasets (i.e., Camera, Camp, Doctor, Drug, Laptop, Lawyer, Radio,
TV and Music) and three unbalanced datasets (i.e., Hotel reviews, Twitter person and Twitter movie). As listed in Table 4,
the highest classification accuracies are obtained by Q-statistics selection rule for both balanced and unbalanced datasets.
Similarly, ENORA algorithm yields the highest predictive performance to explore the search space of candidate classifiers for
both balanced and unbalanced datasets. Finally, consensus clustering algorithm based on self-organizing maps, expectation
maximization and K-means++ algorithms yield the highest predictive performance in clustering classifiers into groups. As
summarized in the experimental results listed in Tables 4–7, the proposed classification scheme can also yield promising
results on unbalanced datasets.
To further evaluate the results obtained in the experimental analysis, we have performed the two-way ANOVA test in the
Minitab statistical program. The results for the two-way ANOVA test of overall results are presented in Table 8, where DF,
SS, MS, F and P denote degrees of freedom, adjusted sum of squares, adjusted mean square, F-Value and probability value,
830 A. Onan et al. / Information Processing and Management 53 (2017) 814–833
respectively. According to the results of the two-way ANOVA test, there are statistically meaningful differences between the
predictive performances of compared methods (p< 0,0 0 01).
In Fig. 7, the confidence interval for the mean values of classification accuracies obtained by the compared algorithms
for a confidence level of 95% are presented. Based on the statistical significances between the results, Fig. 7 is divided into
two regions denoted by the red dashed line. As it can be observed from Fig. 7, the proposed ensemble pruning scheme
is deployed in another region of the interval plot. Hence, the higher predictive performance obtained by the method is
statistically significant than the performance of other methods.
7. Discussions
In this section, we will provide a discussion on the performance of different components for different stages of the
proposed ensemble pruning method. The proposed ensemble pruning method is a hybrid method based on the consensus
clustering and the elitist pareto-based multi-objective evolutionary algorithm. As mentioned in advance, algorithms to be
utilized in the clustering stage, the evaluation measures/rules to be used in selecting classifiers from each cluster and the
metaheuristic algorithms to be employed for exploring the search space of candidate classifiers are essential factors in
building an efficient classifier ensemble.
In this regard, an extensive empirical analysis has been carried on balanced and unbalanced text sentiment benchmarks.
In order to evaluate the effect of rules to be employed for selecting candidate classifiers from each cluster, we compare seven
evaluation measures/rules (i.e., Q-statistics, Dis, DF, WS, BEST, RAN2 and RAN1). In Table 4, we report the predictive perfor-
mance of seven evaluation measures on text benchmarks. In general, we draw the conclusion that selection of candidate
classifiers based on Q-statistics outperforms the other evaluation rules. In addition, selection rules based on diversity mea-
sures (such as Q-statistics, Dis, DF) outperforms other schemes based on random selection of classifiers from each cluster.
In order to evaluate the effect of metaheuristic algorithms on exploring the search space of candidate classifiers,
six metaheuristic algorithms (denoted by ENORA, DE, GA, PSO, BFS and GS) are considered. Regarding performance of
metaheuristic algorithms, ENORA algorithm outperforms the single-objective metaheuristic algorithms.
Regarding the algorithm to be utilized in the clustering stage, we have employed K-means, K-means++, expectation max-
imization and self-organizing map algorithm as the base clustering algorithms. As noted in advance, the clustering results of
different clustering algorithms may vary greatly and the performance of cluster-based ensemble pruning algorithms suffers
from instability of clustering. Motivated by this deficiency, we propose a consensus clustering based approach in the cluster-
ing phase. In the empirical analysis, the mentioned clustering algorithms and their possible subsets (24 –1 cases in total) are
taken into consideration. Based on the empirical analysis, the highest predictive performance is obtained by the consensus
clustering that consists of self-organizing map algorithm (SOM), expectation maximization (EM) and K-means++ (KM++).
The results of the experimental analysis indicate that the consensus clustering can be utilized to obtain stable clusters
in ensemble pruning. The consensus clustering scheme integrates self-organizing maps, expectation maximization and K-
means++ algorithms to generate the cluster ensemble. Moreover, increasing the number of clustering algorithms utilized in
the cluster ensemble does not necessarily increase the clustering quality of the ensemble. The classifiers of the model library
are clustered into groups with the use of cluster ensemble. Then, two candidate classifiers are selected from each cluster
A. Onan et al. / Information Processing and Management 53 (2017) 814–833 831
based on Q-statistics based diversity. Regarding the performance of different diversity measures, Q-statistics has relatively
high performance in obtaining diverse classifiers. In addition, the elitist pareto-based multi-objective evolutionary algorithm
can effectively explore the search space of possible classifiers. The performance of the proposed ensemble pruning scheme
is evaluated with the state-of-the art ensemble methods (AdaBoost, Bagging and Random Subspace) and the ensemble
pruning algorithms (ensemble selection from libraries of models, Bagging ensemble selection and LibD3C algorithm).
In sentiment analysis, the distribution of the polarities can be very unbalanced. In addition to nine balanced text
benchmarks, the predictive performance of three unbalanced text benchmarks is also evaluated. We can find that the
proposed ensemble pruning scheme outperforms the other evaluated configurations for both balanced and unbalanced
text benchmarks. Hence, the proposed classification scheme can also yield promising results on unbalanced datasets. The
performance analysis on twelve public sentiment classification datasets indicates that the proposed scheme outperforms
the compared methods in terms of predictive performance. In addition, the experimental analysis indicates that hybrid
ensemble pruning schemes yield more promising results in text classification.
Many papers on sentiment analysis have examined the utilization of ensemble learning to enhance the predictive perfor-
mance (Prabowo & Thelwall, 2009; Wang et al., 2014; Xia et al., 2011). Our study differs from the existing studies in several
ways. As outlined in our literature review, most works on sentiment analysis have focused on feature engineering based
performance improvement. There are several works about ensemble learning on text or sentiment classification. However,
the ensemble pruning remains underexplored in sentiment analysis and text classification. In this regard, this paper presents
an efficient ensemble pruning based classification scheme for sentiment analysis. The presented ensemble scheme is for-
mulated based on extensive empirical analysis with different selection rules, clustering algorithms and search algorithms.
Based on the extensive empirical analysis, an efficient classification scheme is presented. To the best of our knowledge, this
is the first study in ensemble pruning, which i) employs consensus clustering to obtain stable clustering results, ii) utilizes
the elitist pareto-based multi-objective evolutionary algorithm to explore the search space of possible classifiers and iii)
analyzes the performance of different selection rules, search algorithms and clustering methods for text classification.
Several limitations characterize our research. The presented ensemble pruning scheme is developed based on extensive
empirical analysis on twelve text classification benchmarks. The predictive performance of sentiment analysis is mainly
influenced by the feature engineering involved in representing text documents. Hence, different representation schemes can
be considered in conjunction with the presented scheme.
There are also a number of practical implications of the research. The identification of an appropriate subset of classifiers
is a critical issue in developing robust ensemble classifiers, which is of great importance in many application fields of
ensemble learning. The experimental analysis validates the effectiveness of the hybrid ensemble pruning scheme on text
classification over conventional ensemble methods and ensemble pruning algorithms. The presented hybrid ensemble
pruning scheme may be adapted to other classification tasks.
8. Conclusions
The work describes an ensemble pruning based classification scheme for text sentiment classification. We propose a
hybrid ensemble pruning model that employs clustering and randomized search. To overcome the instability problem of
clustering results, consensus clustering is utilized. With the use of clustering algorithms, the classifiers of the ensemble
are grouped into clusters based on their predictive characteristics. Then, two classifiers from each cluster are selected as
candidate classifiers and the search space of candidate classifiers is explored by the elitist Pareto-based multi-objective evo-
lutionary algorithm. The proposed ensemble pruning scheme is evaluated on twelve balanced and unbalanced benchmark
text classification tasks. The empirical results indicated that the presented scheme can yield promising results for sentiment
analysis in comparison to conventional ensemble methods (i.e., AdaBoost, Bagging and Random Subspace) and three ensem-
ble pruning algorithms (i.e., ensemble selection from libraries of models, Bagging ensemble selection and LibD3C algorithm).
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
References
Aksela, M. (2003). Comparison of classifier selection methods for improving committee performance. In T. Windeatt, & F. Roli (Eds.), Multiple classifier
systems (pp. 84–93). Berlin: Springer Verlag.
Appel, O., Chiclana, F., Carter, J., & Fujita, H. (2016). A hybrid approach to the sentiment analysis problem at the sentence level. Knowledge-Based Systems,
108, 110–124 Advance online publication. doi:10.1016/j.knosys.2016.05.040.
Arthur, D., & Vassilvitskii, S. (2007). K-means++: The advantage of careful seeding. In Proceedings of the eighteenth annual symposium on discrete algorithms
(pp. 1027–1035).
Bhatia, M. P. S., & Khalid, A. K. (2008). Information retrieval and machine learning: Supporting technologies for web mining research and practice. Webology,
5, 2–19.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
Caruana, R., Niculescu-Mizil, A., Crew, G., & Ksikes, A. (2004). Ensemble selection from libraries of models. In Proceedings of ICML 04 (pp. 18–38).
Cavalcanti, C. D. C., Oliveira, L. S., Moura, T. J. M., & Carvalho, G. V. (2016). Combining diversity measures for ensemble pruning. Pattern Recognition Letters,
74, 38–45.
Chen, L., Wang, W., Nagarajan, M., Wang, S., & Sheth, A. P. (2012). Extracting diverse sentiment expressions with target-dependent polarity from Twitter. In
Proceedings of the sixth international AAAI conference on weblogs and social media (pp. 50–57).
832 A. Onan et al. / Information Processing and Management 53 (2017) 814–833
Cheng, L., Wang, Y., Hou, Z-G., Tan, M., & Cao, Z. (2013). Sampled-data based average consensus of second-order integral multi-agent systems: Switching
topologies and communication noises. Automatica, 49(5), 1458–1464.
Coelho, G. P., & Von Zuben, F. J. (2006). The influence of the pool of candidates on the performance of selection and combination techniques in ensembles.
In Proceedings of international joint conference on neural networks (pp. 5132–5139).
Da Silva, N. F. F., Hruschka, E. R., & Hruschka, E. R. (2014). Tweet sentiment analysis with classifier ensembles. Decision Support Systems, 66, 170–179.
Dai, Q. (2013). A novel ensemble pruning algorithm based on randomized greedy selective strategy and ballot. Neurocomputing, 122, 258–265.
Dai, Q., & Liu, Z. (2013). ModEnPBT: A modified backtracking ensemble pruning algorithm. Applied Soft Computing, 13(11), 4292–4302.
Dai, Q., Zhang, T., & Liu, N. (2015). A new reverse reduce-error ensemble pruning algorithm. Applied Soft Computing, 28, 237–249.
del Pilar Salas-Zarate, M., Lopez-Lopez, E., Valencia-Garcia, R., Aussenac-Gilles, N., Almela, A., & Alor-Hernandez, G. (2014). A study on LIWC categories for
opinion mining in Spanish reviews. Journal of Information Science, 40(6), 749–760.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society,
39, 1–38.
Dietterich, T. G. (20 0 0). Ensemble methods in machine learning. In Proceedings of the 1st international workshop in multiple classifier systems (pp. 1–15).
Elghazel, H., Aussem, A., Gharroudi, O., & Saadaoui, W. (2016). Ensemble multi-label text categorization based on rotation forest and latent semantic index-
ing. Expert Systems with Applications, 57, 1–11.
Engelbrecht, A. P. (2007). Computational intelligence: An introduction. New York: Wiley (Chapter 13).
Fersini, E., Messina, E., & Pozzi, F. A. (2014). Sentiment analysis: Bayesian ensemble learning. Decision Support Systems, 68, 26–38.
Fersini, E., Messina, E., & Pozzi, F. A. (2016). Expressive signals in social media languages to improve polarity detection. Information Processing and Manage-
ment, 52, 20–35.
Fowlkes, E. B., & Mallows, J. A. (1983). A method for comparing two hiearchical clusterings. Journal of American Statistical Association, 78, 553–569.
Fusilier, D. H., Montes-y-Gomez, M., Rosso, P., & Cabrera, R. G. (2015). Detecting positive and negative deceptive opinions using PU-learning. Information
Processing and Management, 51, 433–443.
Fred, A. L., & Jain, A. K. (2005). Combining multiple clusterings using evidence accumulation. IEEE transactions on pattern analysis and machine intelligence,
27(6), 835–850.
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., & Herrera, F. (2016). Ordering-based pruning for improving the performance of ensembles of classifiers
in the framework of imbalanced datasets. Information Sciences, 354, 178–196.
Gashler, M., Giraud-Carrier, C., & Martinez, T. (2008). Decision tree ensemble: Small heterogeneous is better than large homogeneous. In Proceedings of
ICMLA ’08 (pp. 900–905).
Ghaemi, R., Sulaiman, M. N., Ibrahim, H., & Mustapha, N. (2009). A survey: Clustering ensemble techniques. World Academy of Science, Engineering and
Technology, 50, 636–645.
Ghohs, J., & Acharya, A. (2011). Cluster ensembles. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(4), 305–315.
Glaab, E. (2011). Analyzing functional genomics data using novel ensemble, consensus and data fusion techniques (Unpublished doctoral thesis). Nottingham,
United Kingdom: University of Nottingham.
Gütlein, M. (2006). Large scale attribute selection using wrappers (Unpublished diploma thesis). Freiburg, Germany: University of Freiburg.
Hall, M. A. (1999). Correlation-based feature selection for machine learning (Unpublished doctoral thesis). Hamilton, New Zealand: University of Waikato.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The weka data mining software: An update. SIGKDD Explorations, 11(1),
10–18.
Han, J., & Kamber, M. (2006). Data mining: Concepts and techniques. New York: Morgan Kaufmann Publishers (Chapter 10).
Hernandez-Lobato, D., Martinez-Munoz, G., & Suarez, A. (2011). Empirical analysis and evaluation of approximate techniques for pruning regression bagging
ensembles. Neurocomputing, 74, 2250–2264.
Holland, J. H. (1975). Adaption in natural and artificial systems. Ann Arbor: University of Michigan Press (Chapter 2).
Jain, A. K. (2010). Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31, 651–666.
Jimenez, F., Sanchez, G., & Juarez, J. M. (2014). Multi-objective evolutionary algorithms for fuzzy classification in survival prediction. Artificial Intelligence in
Medicine, 60, 197–219.
Jin, X., & Han, J. (2010). Expectation maximization clustering. In C. Sammut, & G. I. Webb (Eds.), Encyclopedia of machine learning (pp. 382–383). Berlin:
Springer-Verlag.
Kennedy, J., & Eberhart, R. C. (1995). Particle swarm optimization. In Proceedings of the international conference on neural networks (pp. 1942–1948).
Khan, F. H., Bashir, S., & Qamar, U. (2014). TOM: Twitter opinion mining framework using hybrid classification scheme. Decision Support Systems, 57, 245–257.
Kohonen, T. (2001). Self-organizing maps. Berlin: Springer-Verlag.
Kotsiantis, S. B., & Pintelas, P. E. (2005). Selective averaging of regression models. Annals of Mathematics, Computing & Teleinformatics, 1(3), 65–74.
Kuncheva, L. I., & Whitaker, C. J. (2003). Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning,
51(2), 181–207.
Kuncheva, L. I. (2014). Combining pattern classifiers: Methods and algorithms. New York: Wiley (Chapter 6).
Lin, C., Chen, W., Qiu, C., Wu, Y., Krishnan, S., & Zhou, Q. (2014). LibD3C: Ensemble classifiers with a clustering and dynamic selection strategy. Neurocom-
puting, 123, 424–435.
Liu, Z., Liu, S., Liu, L., Sun, J., Peng, X., & Wang, T. (2016). Sentiment recognition of online course reviews using multi-swarm optimization-based selected
features. Neurocomputing, 185, 11–20.
Ma, Z., Dai, Q., & Liu, N. (2015). Several novel evaluation measures for rank-based ensemble pruning with applications to time series prediction. Expert
Systems with Applications, 42(1), 280–292.
Margineantu, D. D., & Dietterich, T. G. (1997). Pruning adaptive boosting. In Proceedings of the fourteenth international conference on machine learning
(pp. 211–218).
Martinez-Munoz, G., & Suarez, A. (2006). Pruning in ordered bagging ensembles. In Proceedings of the 23rd international conference on machine learning
(pp. 609–616).
Martinez-Munoz, G., & Suarez, A. (2007). Using boosting to prune bagging ensembles. Pattern Recognition Letters, 28, 156–165.
Mendes-Moreira, J., Soares, C., Jorge, A. M., & De Sousa, J. F. (2012). Ensemble approaches for regression: A survey. ACM Computing Surveys, 45(1), 10–39.
Mendialdua, I., Arruti, A., Jauregi, E., Lazkano, E., & Sierra, B. (2015). Classifier subset selection to construct multi-classifiers by means of estimation of
distribution algorithms. Neurocomputing, 157, 46–60.
Mirkin, B. (2001). Reinterpreting the category utility function. Machine Learning, 45(2), 219–228.
Mousavi, R., & Eftekhari, M. (2015). A new ensemble learning methodology based on hybridization of classifier ensemble selection approaches. Applied Soft
Computing, 37, 652–666.
Obitko, M. (2015). Introduction to genetic algorithms Retrieved from http://www.obitko.com/tutorials/.
Onan, A., & Korukoğlu, S. (2015). A feature selection model based on genetic rank aggregation for text sentiment classification. Journal of Information Science,
43(1), 25–38. doi:10.1177/0165551515613226.
Onan, A., Korukoğlu, S., & Bulut, H. (2016). Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications,
57, 232–247.
Partalas, I., Tsoumakas, G., Katakis, I., & Vlahavas, I. (2006). Ensemble pruning using reinforcement learning. In Hellenic Conference on Artificial Intelligence
(pp. 301–310). Berlin Heidelberg: Springer.
Partalas, I., Tsoumakas, G., & Vlahavas, I. (2012). A study on greedy algorithms for ensemble pruning. Thessaloniki, Greece: Aristotle University of Thessaloniki
(Technical report).
A. Onan et al. / Information Processing and Management 53 (2017) 814–833 833
Pinto, F. (2013). Metalearning for dynamic integration in ensemble methods (Thesis proposal). Porto, Portuguese: University of Porto.
Prabowo, R., & Thelwall, M. (2009). Sentiment analysis: A combined approach. Journal of Informetrics, 3, 143–157.
Rich, E., & Knight, K. (1991). Artificial intelligence. New York: McGraw-Hill (Chapter 2).
Rokach, L. (2010). Ensemble-based classifiers. Artificial Intelligence Review, 33, 1–39.
Roli, F., Giacinto, G., & Vernazza, G. (2001). Methods for designing multiple classifier systems. Lecture Notes in Computer Science, 2096, 78–87.
Ruta, D., & Gabrys, B. (2001). Application of the evolutionary algorithms for classifier selection in multiple classifier systems with majority voting. In
Proceedings of the second international workshop on multiple classifier systems (pp. 399–408).
Saif, H., He, Y., Fernandez, M., & Alani, H. (2016). Contextual semantics for sentiment analysis of Twitter. Information Processing and Management, 52, 5–19.
Selim, S. Z., & Ismail, M. A. (1984). K-means-type algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 6(1), 81–87.
Sheen, S., & Sirisha, A. P. (2013). Malware detection by pruning of parallel ensembles using harmony search. Pattern Recognition Letters, 34, 1679–1686.
Sheen, S., Aishwarya, S. V., Anitha, R., Raghavan, S. V., & Bhaskar, S. M. (2012). In Ensemble pruning using harmony search: 7209 (pp. 13–24).
Storn, R., & Price, K. (1997). Differential evolution simple and efficient heuristic for global optimization over continuos spaces. Journal of Global Optimization,
11(4), 341–359.
Strehl, A., & Ghosh, J. (2002). Cluster ensembles: A knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3,
583–617.
Strehl, A., & Ghosh, J. (2003). Cluster ensembles–A knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3,
583–617.
Sun, J., Wang, G., Cheng, X., & Fu, Y. (2015). Mining affective text to improve social media item recommendation. Information Processing and Management,
51, 444–457.
Sun, Q., & Pfahringer, B. (2011). Bagging ensemble selection. In Proceedings of the 24th Australasian joint conference on artificial intelligence (pp. 251–260).
Swiderski, B., Osowski, S., Kruk, M., & Barhoumi, W. (2016). Aggregation of classifiers ensemble using local discriminatory power and quantiles. Expert
Systems with Applications, 46, 316–323.
Talbi, E. G. (2009). Metaheuristics from design to implementation. New York: Wiley (Chapter 5).
Tamon, C., & Xiang, J. (20 0 0). In On the boosting pruning problem: 1810 (pp. 404–412).
Tan, P. N., Steinbach, M., & Kumar, V. (2005). Introduction to data mining. Boston: Addison-Wesley (Chapter 6).
Theodoridis, S., & Koutroumbas, K. (1999). Pattern recognition. New York: Academic Press (Chapter 4).
Tsoumakas, G., Partalas, I., & Vlahavas, I. (2008). A taxonomy and short review of ensemble selection. In Proceedings of ECAI 08 workshop on supervised and
unsupervised ensemble methods and their applications (pp. 1–6).
Vega-Pons, S., & Ruiz-Shulcloper, J. (2011). A survey of clustering ensemble algorithms. International Journal of Pattern Recognition and Artificial Intelligence,
25(3), 337–372.
Wang, G., Sun, J., Ma, J., Xu, K., & Gu, J. (2014). Sentiment classification: The contribution of ensemble learning. Decision Support Systems, 57, 77–93.
Wang, G., Zhang, Z., Sun, J., Yang, S., & Larson, C. A. (2015). POS-RS: A Random subspace method for sentiment classification based on part-of-speech
analysis. Information Processing and Management, 51, 458–479.
WEKA 3:. (2016). Data mining software in Java (n.d.) Retrieved May 25 from http://www.cs.waikato.ac.nz/ml/weka/.
Whitehead, M., & Yaeger, L. (2009). Building a general purpose cross-domain sentiment mining model. In Proceedings of WRI world congress on computer
science and information engineering (pp. 472–476). Los Angeles.
Xia, R., Xu, F., Yu, J., Qi, Y., & Cambria, E. (2016). Polarity shift detection, elimination and ensemble: A three-stage model for document-level sentiment
analysis. Information Processing and Management, 52, 36–45.
Xia, R., Zong, C., & Li, S. (2011). Ensemble of feature sets and classification algorithms for sentiment classification. Information Sciences, 181, 1138–1152.
Xiao, H., Xiao, Z., & Wang, Y. (2016). Ensemble classification based on supervised clustering for credit scoring. Applied Soft Computing, 43, 73–86.
Yoon, H. G., Kim, H., Kim, C. O., & Song, M. (2016). Opinion polarity detection in Twitter data combining shrinkage regression and topic modelling. Journal
of Informetrics, 10, 634–644.
Zhang, H., & Cao, L. (2014). A spectral clustering based ensemble pruning approach. Neurocomputing, 139, 289–297.
Zhang, D., Ma, J., Yi, J., Niu, X., & Xu, X. (2015). An ensemble method for unbalanced sentiment classification. In Proceedings of 11th international conference
on natural computation (pp. 440–445).
Zhou, Z-H., & Tang, W. (2003). In Selective ensemble of decision trees: 2639 (pp. 476–483).
Zhou, Z-H., Wu, J., & Tang, W. (2002). Ensembling neural networks: Many could be better than all. Artificial Intelligence, 137, 239–263.