O G Reyesetal-ScalableextensionsoftheReliefFalgorithm
O G Reyesetal-ScalableextensionsoftheReliefFalgorithm
net/publication/257947469
CITATIONS READS
17 754
3 authors:
Sebastian Ventura
University of Cordoba (Spain)
346 PUBLICATIONS 13,672 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Sebastian Ventura on 20 July 2015.
Abstract
Multi-label learning has become an important area of research due to the increasing number of modern applications
that contain multi-label data. The multi-label data are structured in a more complex way than single-label data.
Consequently the development of techniques that allow the improvement in the performance of machine learning
algorithms over multi-label data is desired. The feature weighting and feature selection algorithms are important
feature engineering techniques which have a beneficial impact on the machine learning. The ReliefF algorithm is one
of the most popular algorithms to feature estimation and it has proved its usefulness in several domains. This paper
presents three extensions of the ReliefF algorithm for working in the multi-label learning context, namely ReliefF-ML,
PPT-ReliefF and RReliefF-ML. PPT-ReliefF uses a problem transformation method to convert the multi-label problem
into a single-label problem. ReliefF-ML and RReliefF-ML adapt the classic ReliefF algorithm in order to handle
directly the multi-label data. The proposed ReliefF extensions are evaluated and compared with previous ReliefF
extensions on 34 multi-label datasets. The results show that the proposed ReliefF extensions improve preceding
extensions and overcome some of their drawbacks. The experimental results are validated using several nonparametric
statistical tests and confirm the effectiveness of the proposal for a better multi-label learning.
Keywords: Multi-label learning, ReliefF algorithm, feature weighting, feature selection, multi-label classification,
label ranking
smaller value of dL represents a major similarity in the P Mil respectively. ReliefF-ML uses the given represen-
classification of these instances. tation of the original datasets, i.e. it does not use a PTM.
ReliefF-ML requires the retrieval of the k-nearest
| yi 4y j | neighbours for each relevant and irrelevant label of an
dL (i, j) = (6) instance i. However, through a linear search over the
q
training set the groups of the k-nearest neighbors for an
For each relevant and irrelevant label of a sampling
instance i can be find efficiently. Consequently, the time
instance i a group of k-nearest neighbours is defined.
complexity of ReliefF-ML is equal to the classic Reli-
The following groups of Hits (Hil ) and Misses (Mil ) with
efF algorithm (O(m · n · d)). The Algorithm 1 describes
respect to an instance i are computed:
the ReliefF-ML extension.
• Hil : k-nearest neighbours that have the relevant la-
bel l of i as relevant label. Algorithm 1: ReliefF-ML algorithm.
Input : E → training set of multi-label instances
• Mil : k-nearest neighbours that have the irrelevant m → number of sampling instances
label l of i as relevant label. k → number of nearest neighbours
Output: W → weight vector
Based in the defined groups Hil and Mil , the following 1 begin
2 foreach f ∈ F do
probabilities are defined: 3 W f ← 0;
P 4 end
j∈Hil dL (i, j) 5 foreach l ∈ L do
PHil = (7) 6 Pl ← LabelProbability (l); (equation 5)
k
7 end
for t ← 1 to m do
P
j∈Mil dL (i, j)
8
Table 1: Statistics of the benchmark datasets, number of instances (n), number of features (d), number of labels (q), different subsets of labels (d s ),
label cardinality (lc ) and label density (ld ). The datasets are ordered by their complexity calculated as n · d.
with a learned weight vector by a ReliefF ex- Algorithm 4: Iterative procedure used to evaluate a fea-
tension. All the feature weights were scaled ture ranking. The expression X Y means that the X value
to [0, 1] using the min-max normalization [1]. is better than the Y value.
Input : W → weight vector
Feature selection setting Φ → multi-label learning algorithm
Output: Best evaluation measure value
The ReliefF extensions were compared into the FS 1 begin
process to improve the effectiveness of the multi-label 2 R ← Rank (W);
learning algorithms. The BRkNN classifier was used 3 R0 ← f TopFeatures (R);
as base-line algorithm, owing to its simplicity and high 4 BestEvaluation ← Empty;
5 F s ← ∅;
sensitivity to the presence of irrelevant and redundant 6 foreach f ∈ R0 do
features. 7 TempEvaluation ← Evaluate (Φ,F s ∪ f );
A learned weight vector by the ReliefF algorithm can 8 if TempEvaluation BestEvaluation then
9 BestEvaluation ← TempEvaluation;
be viewed as a feature ranking. A feature ranking is 10 Fs = Fs ∪ f ;
important to guide the search on FS process, especially 11 end
where the search space to find the best feature subset is 12 end
large [1]. Several methods has been proposed to eval- 13 return BestEvaluation;
uate a feature ranking on the FS process [1, 25, 88]. 14 end
Table 2: HL (↓) results for MLkNN on FW process. The Friedman’s test rejects the null hypothesis with a p-value equal to 6.488E-7.
of each feature correctly. It also confirms the effec- BRkNN classifier obtained a lower average rank when
tiveness of the ReliefF algorithm as a feature weighting it used the feature subsets determined from PPT-ReliefF,
method for a better multi-label lazy learning. ReliefF-ML and RReliefF-ML.
The results showed that on simple multi-label
4.4.2. Feature selection datasets; such as Flags, Birds, Emotions, Cal500, Yeast,
Tables 4 and 5 show the results of the BRkNN clas- Genbase and Medical, the FS process yielded a con-
sifier using the whole feature space and the feature sub- siderable improvement over the effectiveness of the
sets determined from the ReliefF extensions for the HL BRkNN classifier.
and F1Ex evaluation measures. The results for the AP It is important to highlight that BRkNN using the fea-
and RL measures can be consulted in the available web ture subsets of PPT-ReliefF, ReliefF-ML and RReliefF-
page. In the tables the last two rows show the average ML obtained the best results on the eleven complex
rank (Rank) and the position in the ranking (Pos.) of multi-label datasets that come from Yahoo collection.
each method. Several of the multi-label datasets have a large num-
Generally speaking, the results showed that the ber of features, e.g. Bibtex (1836 features), Medical
BRkNN classifier using the feature subsets determined (1449 features), Genbase (1186 features), Enron (1001
from the feature rankings of PPT-ReliefF, ReliefF-ML features) and Yahoo collection (from 21924 to 52350
and RReliefF-ML performed better than the BRkNN features). The experimental results can be considered
that used the whole feature space. On the other hand, as formidable results, considering that only the 100-top
12
MLkNN
Dataset
- BR-ReliefF LP-ReliefF MReliefF PPT-ReliefF ReliefF-ML RReliefF-ML
Flags 0.696 0.713 0.718 0.684 0.694 0.708 0.720
Cal500 0.326 0.328 0.331 0.308 0.331 0.332 0.339
Emotions 0.587 0.628 0.626 0.633 0.664 0.676 0.643
Birds 0.527 0.490 0.524 0.518 0.523 0.528 0.569
Yeast 0.611 0.609 0.617 0.611 0.609 0.611 0.617
Scene 0.682 0.669 0.658 0.656 0.685 0.656 0.663
Genbase 0.950 0.0953 0.977 0.0952 0.980 0.967 0.981
Medical 0.439 0.412 0.533 0.421 0.515 0.515 0.543
Enron 0.417 0.417 0.461 0.357 0.475 0.419 0.482
Corel5k 0.019 0.017 0.005 0.010 0.023 0.027 0.035
Mediamill 0.533 0.533 0.533 0.528 0.534 0.533 0.537
Corel16k01 0.013 0.010 0.007 0.010 0.013 0.011 0.015
Corel16k02 0.016 0.013 0.009 0.010 0.033 0.020 0.027
Corel16k03 0.012 0.012 0.009 0.015 0.020 0.015 0.012
Corel16k04 0.015 0.012 0.026 0.027 0.020 0.028 0.022
Corel16k05 0.015 0.018 0.006 0.013 0.019 0.024 0.018
Corel16k06 0.018 0.018 0.016 0.017 0.034 0.020 0.025
Corel16k07 0.017 0.020 0.011 0.018 0.027 0.020 0.025
Corel16k08 0.018 0.017 0.009 0.005 0.024 0.023 0.020
Corel16k09 0.005 0.006 0.009 0.010 0.018 0.013 0.016
Corel16k10 0.006 0.008 0.009 0.006 0.016 0.018 0.012
Bibtex 0.161 0.174 0.189 0.201 0.226 0.201 0.205
TMC2007-500 0.660 0.604 0.614 0.609 0.617 0.680 0.671
Arts 0.034 0.029 0.033 0.024 0.050 0.055 0.039
Science 0.015 0.010 0.017 0.016 0.018 0.022 0.023
Business 0.737 0.728 0.736 0.729 0.735 0.737 0.735
Health 0.363 0.423 0.347 0.421 0.387 0.365 0.376
Reference 0.236 0.221 0.131 0.219 0.241 0.264 0.244
Education 0.026 0.026 0.026 0.032 0.032 0.028 0.027
Recreation 0.058 0.060 0.059 0.065 0.064 0.060 0.056
Entertaiment 0.108 0.104 0.113 0.107 0.128 0.116 0.124
Computers 0.369 0.409 0.432 0.378 0.433 0.427 0.429
Society 0.155 0.150 0.173 0.135 0.165 0.178 0.169
Social 0.263 0.223 0.310 0.286 0.285 0.336 0.321
Rank 4.941 5.323 4.691 5.265 2.618 2.706 2.456
Pos. 5 7 4 6 2 3 1
Table 3: F1Ex (↑) results for MLkNN on FW process. The Friedman’s test rejects the null hypothesis with a p-value equal to 5.295E-11.
features of the feature rankings were considered in the the BRkNN classifier that used the whole feature space
FS process. The BRkNN classifier using a small num- for the four multi-label evaluation measures considered.
ber of features (fewer or equal than 100 features) per- The BRkNN classifier using the three proposed ReliefF
formed better than the BRkNN that used the whole fea- extensions performed better than the BRkNN classifier
ture space. using the BR-ReliefF, LP-ReliefF and MReliefF exten-
A statistical analysis to detect significant differences sions for the four multi-label evaluation measures.
on the performance among the ReliefF extensions was The Bergmann-Hommel’s test did not detect signif-
carried out. The Friedman’s test rejected the null hy- icant differences among PPT-ReliefF, ReliefF-ML and
pothesis in all cases analyzed considering a significance RReliefF-ML extensions. However, PPT-ReliefF ob-
level α = 0.05. The p-values returned by the Friedman’s tained the first position in two of four average rankings
test can be consulted on the tables. (four rankings= one classifier × four evaluation mea-
Afterwards, a Bergmann-Hommel’s post-hoc test for sures) returned by the Friedman’s test. RReliefF-ML
all pairwise comparison was carried out. The results of obtained the first position in one average ranking and
the Bergmann-Hommel’s test are displayed in the figure the second position in two rankings. ReliefF-ML ob-
2. tained the third position in three rankings.
From a statistical point of view, the BRkNN classifier The RReliefF-ML extension obtained the best results
using the feature subsets determined from PPT-ReliefF, for HL measure, followed by PPT-ReliefF. ReliefF-ML
RReliefF-ML and ReliefF-ML extensions outperformed performed better for F1Ex measure, followed by PPT-
13
Figure 1: Significant differences of the performance among ReliefF extensions on the MLkNN classifier according to the Bergmann-Hommel’s
test.
Figure 2: Significant differences of the performance among ReliefF extensions on the BRkNN classifier according to the Bergmann-Hommel’s test.
ReliefF. PPT-ReliefF obtained the best results for the RL the usefulness of each feature. Also, it confirms the ef-
and AP measures, followed by RReliefF-ML. fectiveness of the ReliefF algorithm as a feature selec-
From the statistical analysis, we concluded that the tion method for a better multi-label learning.
three proposed ReliefF extensions correctly determine
14
BRkNN
Dataset
- BR-ReliefF LP-ReliefF MReliefF PPT-ReliefF ReliefF-ML RReliefF-ML
Flags 0.271 0.250 0.226 0.237 0.228 0.228 0.232
Cal500 0.145 0.140 0.140 0.140 0.140 0.140 0.139
Emotions 0.197 0.182 0.178 0.175 0.176 0.179 0.178
Birds 0.049 0.042 0.044 0.045 0.043 0.046 0.042
Yeast 0.203 0.208 0.205 0.200 0.201 0.203 0.200
Scene 0.108 0.115 0.115 0.115 0.114 0.146 0.118
Genbase 0.003 0.004 0.003 0.004 0.003 0.002 0.001
Medical 0.021 0.027 0.015 0.024 0.015 0.026 0.015
Enron 0.058 0.053 0.050 0.052 0.049 0.058 0.049
Corel5k 0.010 0.009 0.010 0.010 0.009 0.009 0.009
Mediamill 0.032 0.033 0.032 0.032 0.032 0.031 0.031
Corel16k01 0.020 0.020 0.020 0.020 0.020 0.019 0.019
Corel16k02 0.020 0.019 0.019 0.019 0.019 0.018 0.019
Corel16k03 0.020 0.020 0.020 0.020 0.020 0.020 0.020
Corel16k04 0.019 0.019 0.019 0.020 0.019 0.019 0.019
Corel16k05 0.019 0.019 0.019 0.019 0.018 0.017 0.018
Corel16k06 0.020 0.019 0.019 0.019 0.018 0.017 0.018
Corel16k07 0.018 0.018 0.018 0.018 0.017 0.017 0.017
Corel16k08 0.019 0.018 0.018 0.018 0.018 0.018 0.018
Corel16k09 0.019 0.018 0.018 0.018 0.017 0.017 0.017
Corel16k10 0.021 0.020 0.020 0.020 0.019 0.017 0.019
Bibtex 0.015 0.014 0.014 0.014 0.013 0.013 0.014
TMC2007-500 0.064 0.066 0.065 0.064 0.064 0.066 0.064
Arts 0.063 0.062 0.063 0.062 0.062 0.062 0.061
Science 0.035 0.033 0.034 0.036 0.033 0.032 0.032
Business 0.029 0.029 0.030 0.032 0.027 0.027 0.029
Health 0.050 0.051 0.051 0.049 0.049 0.047 0.047
Reference 0.037 0.040 0.037 0.036 0.032 0.032 0.031
Education 0.044 0.045 0.046 0.042 0.037 0.035 0.036
Recreation 0.063 0.062 0.064 0.058 0.058 0.058 0.058
Entertaiment 0.063 0.061 0.060 0.064 0.057 0.057 0.056
Computers 0.040 0.042 0.040 0.039 0.039 0.036 0.040
Society 0.058 0.050 0.054 0.051 0.049 0.050 0.051
Social 0.030 0.032 0.032 0.031 0.026 0.025 0.026
Rank 5.485 5.000 4.779 4.515 2.794 2.853 2.573
Pos. 7 6 5 4 2 3 1
Table 4: HL (↓) results for BRkNN on FS process. The Friedman’s test rejects the null hypothesis with a p-value equal to 5.122E-11.
4.4.3. Discussion since RReliefF-ML does not use any PTM and retrieves
In the table 6 a summary of the main characteristics of only k-nearest neighbours for a sampling instance. On
the state-of-the-art ReliefF extensions and of the three the other hand, it is important to highlight that RReliefF-
extensions proposed in this paper is shown. In the first ML is the most simplest proposed extension and it ob-
column appears the time complexity of each ReliefF ex- tains very good results on the FW and FS processes.
tension. The column named “Label correlation” spec- The evidence suggests that the ReliefF extensions
ify whether the corresponding ReliefF extension han- which use the Label Powerset family of methods (i.e.
dles the dependencies among labels or not. The column LP and PPT) as PTM perform better than those ReliefF
“Transformation” states the type of PTM that use each extensions that use a PTM that converts the multi-label
ReliefF extension. The last column describes some ad- problem into several single-label problems (e.g. BR and
vantages and disadvantages of the ReliefF extensions. RPC), not only we referred on computing time but on
Those ReliefF extensions that use BR and RPC as the efficacy to determine the feature weights. On the
PTM are very expensive in multi-label datasets that other hand, the ReliefF extensions which consider the
have a large number of labels. The PPT-ReliefF, label dependencies perform better than those extensions
RReliefF-ML and ReliefF-ML have the same compu- which do not consider the label dependencies.
tational complexity and perform faster than MReliefF In the case of the FW process, the PPT-ReliefF,
and BR-ReliefF extensions. However, the RReliefF- ReliefF-ML and RReliefF-ML extensions improved the
ML method is faster than ReliefF-ML and PPT-ReliefF, performance of the four lazy classifier on the four eval-
15
BRkNN
Dataset
- BR-ReliefF LP-ReliefF MReliefF PPT-ReliefF ReliefF-ML RReliefF-ML
Flags 0.675 0.728 0.746 0.724 0.730 0.730 0.729
Cal500 0.302 0.321 0.318 0.318 0.318 0.325 0.320
Emotions 0.584 0.624 0.631 0.600 0.621 0.639 0.633
Birds 0.523 0.584 0.582 0.572 0.589 0.542 0.598
Yeast 0.583 0.590 0.585 0.583 0.580 0.593 0.590
Scene 0.551 0.523 0.550 0.545 0.541 0.500 0.521
Genbase 0.976 0.800 0.982 0.705 0.983 0.983 0.900
Medical 0.335 0.142 0.506 0.237 0.595 0.400 0.560
Enron 0.267 0.388 0.400 0.333 0.486 0.347 0.410
Corel5k 0.004 0.018 0.003 0.040 0.025 0.016 0.021
Mediamill 0.527 0.528 0.518 0.523 0.527 0.529 0.534
Corel16k01 0.009 0.019 0.019 0.019 0.020 0.020 0.020
Corel16k02 0.034 0.038 0.040 0.035 0.042 0.044 0.041
Corel16k03 0.013 0.015 0.014 0.016 0.024 0.023 0.019
Corel16k04 0.030 0.038 0.036 0.035 0.039 0.040 0.038
Corel16k05 0.014 0.008 0.009 0.012 0.020 0.027 0.018
Corel16k06 0.048 0.050 0.055 0.061 0.090 0.099 0.079
Corel16k07 0.008 0.009 0.010 0.006 0.009 0.024 0.009
Corel16k08 0.028 0.020 0.020 0.017 0.031 0.030 0.022
Corel16k09 0.060 0.050 0.066 0.060 0.085 0.099 0.090
Corel16k10 0.038 0.040 0.040 0.055 0.066 0.060 0.065
Bibtex 0.070 0.104 0.127 0.160 0.224 0.129 0.149
TMC2007-500 0.596 0.578 0.567 0.585 0.600 0.593 0.590
Arts 0.028 0.044 0.038 0.055 0.066 0.067 0.060
Science 0.014 0.016 0.015 0.017 0.020 0.020 0.018
Business 0.726 0.716 0.718 0.687 0.798 0.766 0.792
Health 0.230 0.221 0.225 0.229 0.245 0.247 0.256
Reference 0.403 0.398 0.365 0.415 0.428 0.421 0.419
Education 0.039 0.032 0.039 0.041 0.085 0.089 0.078
Recreation 0.037 0.032 0.030 0.021 0.065 0.098 0.077
Entertaiment 0.096 0.092 0.095 0.085 0.122 0.123 0.125
Computers 0.366 0.354 0.325 0.365 0.421 0.410 0.410
Society 0.152 0.150 0.150 0.148 0.169 0.174 0.145
Social 0.217 0.223 0.215 0.245 0.248 0.248 0.261
Rank 5.412 5.103 4.912 5.118 2.338 2.279 2.838
Pos. 7 5 4 6 2 1 3
Table 5: F1Ex (↑) results for BRkNN on FS process. The Friedman’s test rejects the null hypothesis with a p-value equal to 6.091E-11.
uation measures considered. However, the results of the label datasets on the MLC and LR tasks. However, the
weighted lazy classifiers using BR-ReliefF, LP-Relief results showed that there was a smaller increase of the
and MReliefF vary according to the measure and dataset performance in those multi-label datasets with a small
employed. label density and a big number of distinct label sets at
The evidence suggests that the learned weight vec- the same time, e.g. the Corel5k and Corel16k collec-
tor in the training phase allows the distance function re- tion.
cover those nearest examples in the feature space that In the case of the FS process, the results showed
are associated with the major confidence set of labels for that the PPT-ReliefF outperformed the LP-ReliefF ex-
classifying a query instance. The proposed ReliefF ex- tension. This result confirmed that the PPT technique
tensions performed well for simple and complex multi- is significantly superior to the LP method. PPT permit-
16
ted to reduce the complexity of the multi-label datasets benefits of the ReliefF algorithm as feature engineering
without loss of effectiveness in machine learning. technique for a better MLL, the main motivation for the
The evidence suggested that the distributions of the present work. We recommend that when new ReliefF
relevant features on the f -top features of the rankings extensions to MLL are proposed, they should be com-
determined by PPT-ReliefF, ReliefF-ML and RReliefF- pared with PPT-ReliefF, RReliefF-Ml and ReliefF-ML
ML were better than the distributions of the relevant extensions using several evaluation measures.
features on the three other ReliefF extensions. More- Future work will carry out a comparative study of
over, the three proposed ReliefF extensions performed how the proposed ReliefF extensions scale to other
well on simple and complex multi-label datasets for the state-of-the-art multi-label feature estimation and fea-
MLC and LR tasks on the FS process. According to ture selection algorithms. Furthermore, it would be im-
the results, the proposed ReliefF extensions performed portant to examine the effectiveness of the proposal on
better on datasets that have a small label density. synthetic multi-label datasets, where the number of rel-
evant, irrelevant and redundant features are known.
5. Conclusions
References
In this work, three scalable ReliefF extensions to
multi-label learning called PPT-ReliefF, ReliefF-ML [1] I. Witten, E. Frank, Data Mining: Practical machine learning
and RReliefF-ML have been presented. The PPT- tools and techniques, 2nd Edition, Morgan Kaufmann, 2005.
ReliefF extension uses the PPT method to convert [2] G. Tsoumakas, I. Katakis, Multi-label classification: An
overview, International Journal of Data Warehousing & Mining
the original multi-label dataset into a new multi-class 3 (2007) 1–13.
dataset. The ReliefF-ML extension can be considered [3] G. Tsoumakas, I. Katakis, I. Vlahavas, Data Mining and Knowl-
as a generalization of the classic ReliefF. On the other edge Discovery Handbook, 2nd Edition, Springer-Verlag, 2010,
hand, the RReliefF-ML extension is based on the princi- Ch. Mining Multi-label Data, pp. 667–686.
[4] G. Madjarov, D. Kocev, D. Gjorgjevikj, An extensive experi-
ples of the well-known ReliefF to regression problems. mental comparison of methods for multi-label learning, Pattern
The three proposed extensions take into account the la- Recognition 45 (2012) 3084–3104.
bel dependencies and the issue of interacting features. [5] A. McCallum, Multi-label text classification with a mixture
The proposed ReliefF extensions were extensively model trained by EM, in: Working Notes of the AAAI-99 Work-
shop on Text Learning, 1999.
compared with previous ReliefF extensions. The exper- [6] T. Li, M. Ogihara, Detecting emotion in music, in: Proceedings
imental study was divided into two parts. In the first of the International Symposium on Music Information Retrieval,
part, the ReliefF extensions were analysed on the FW Washington D.C., USA, 2003, pp. 239–240.
[7] S. Yang, S. Kim, Y. Ro, Semantic home photo categorization,
process to improve the performance of the multi-label Circuits and Systems for Video Technology, IEEE Transactions
lazy algorithms. The statistical analysis showed that the 17 (2007) 324–335.
three proposed ReliefF extensions outperformed previ- [8] M. Boutell, J. Luo, X. Shen, C. Brown, Learning multi-label
ous ReliefF extensions, improving the performance of scene classification, Pattern Recognition 37 (2004) 1757–1771.
[9] S. Diplarisa, G. Tsoumakas, P. Mitkas, I. Vlahavas, Protein clas-
the multi-label lazy algorithms. The statistical tests sification with multiple algorithms, in: Proceedings 10th Pan-
showed that the weighted lazy algorithms that use the hellenic Conference on Informatics (PCI 2005), 2005, pp. 448–
learned weight vector by PPT-ReliefF perform well, fol- 456.
lowed by those that use RReliefF-ML and ReliefF-ML. [10] M. L. Zhang, Z. H. Zhou, Multi-label neural networks with ap-
plications to functional genomics and text categorization, IEEE
In the second part of the experiment, the ReliefF ex- Transactions on Knowledge and Data Engineering 18 (2006)
tensions were evaluated on the FS process. The base- 1338–1351.
line classifier using the feature subsets determined from [11] G. L. Monica, P. M. Granitto, J. C. Gomez, Spot defects detec-
the three proposed ReliefF extensions outperformed the tion in cDNA microarray images, Pattern Anal Applic, Springer-
Verlag 16 (2013) 307–319.
classifier that uses the whole feature space and the fea- [12] P. Duygulu, K. Barnard, N. de Freitas, D. Forsyth, Object recog-
ture subsets determined from previous ReliefF exten- nition as machine translation: Learning a lexicon for a fixed im-
sion. The study shows that with a small number of age vocabulary, in: Proceedings of the 7th European Conference
features the baseline classifier obtains good results on on Computer Vision, 2002, pp. IV:97–112.
[13] N. Ueda, K. Saito, Parametric mixture models for multi-labeled
complex multi-label datasets. text, in: Proceedings of the Neural Information Processing Sys-
The PPT-ReliefF, RReliefF-ML and ReliefF-ML ex- tems 15 (NIPS 15)Kira, MIT Press, 2002, pp. 737–744.
tensions performed well for the MLC and LR tasks on [14] K. Barnard, P. Duygulu, N. de Freitas, D. Forsyth, D. Blei,
M. I. Jordan, Matching words and pictures, Journal of Machine
the FW and FS processes. These extensions are scalable
Learning Research 3 (2003) 1107–1135.
on simple and complex multi-label datasets with dif- [15] M. Worring, C. Snoek, J. van Gemert, J. M. Geusebroek,
ferent properties. The experimental study confirms the A. Smeulders, The challenge problem for automated detection
17
of 101 semantic concepts in multimedia, in: Proceedings of mani, M. Welling, C. Cortes, N. Lawrence, K. Weinberger
the 14th Annual ACM International Conference on Multimedia, (Eds.), Advances in Neural Information Processing Systems 27,
2006, pp. 421–430. Curran Associates, Inc., 2014, pp. 1655–1663.
[16] D. Turnbull, L. Barrington, D. Torres, G. Lanckriet, Seman- [36] Y. Zhou, R. Jin, S. C. H. Hoi, Exclusive lasso for multi-task fea-
tic annotation and retrieval of music and sound effects, IEEE ture selection, Journal of Machine Learning Research 9 (2010)
Transactions on Audio, Speech and Language Processing 16(2) 988–995.
(2008) 467–476. [37] P. Gong, J. Ye, C. Zhang, Robust multi-task feature learning,
[17] R. Bellman, Adaptive Control Processes: A Guided Tour, in: Proceedings of the 18th ACM SIGKDD International Con-
Rand Corporation Research Studies, Princeton University Press, ference on Knowledge Discovery and Data Mining, KDD ’12,
1961. ACM, New York, USA, 2012, pp. 895–903.
[18] D. T. Larose, Discovering knowledge in Data : An introduction [38] J. Zhou, J. Liu, V. Narayan, J. Ye, Modeling disease progres-
to data mining, Jhon Wiley & Sons, 2005. sion via fused sparse group lasso, in: Proceedings of the 18th
[19] D. Wettschereck, D. W. Aha, T. Mohri, A review and empirical ACM SIGKDD International Conference on Knowledge Dis-
evaluation of feature weighting methods for a class of lazy learn- covery and Data Mining, KDD ’12, ACM, New York, NY, USA,
ing algorithms, Artificial Intelligence Review 11 (1997) 273– 2012, pp. 1095–1103.
314. [39] R. Ruiz, J. C. Riquelme, J. S. Aguilar-Ruiz, Fast feature rank-
[20] A. Abraham, E. Corchado, J. Corchado, Hybrid learning ma- ing algorithm, in: Proceedings of Knowledge-Based Intelligent
chines, Neurocomputing 72 (2009) 2729–2730. Information and Engineering Systems, KES-2003, Springer
[21] K. Kira, L. Rendell, A practical approach to feature selection, Berlin, 2003, pp. 325–331.
in: Proceedings of the Int. Conf. on Machine Learning, Morgan [40] V. Jovanoski, N. Lavrac, Feature subset selection in association
Kaufmann, 1992, pp. 249–256. rules learning systems, in: Proceedings of Analysis, Warehous-
[22] I. Kononenko, Estimating attributes: Analysis and extension of ing and Mining the Data, 1999, pp. 74–77.
ReliefF, in: Proceedings of the 7th European Conference in Ma- [41] B. Zupan, M. Bohanec, J. Demsar, I. Bratko, Learning by dis-
chine Learning, ECML-94, Springer-Verlag, 1994, pp. 171–182. covering concept hierarchies, Artificial Intelligence 109 (1-2)
[23] I. kononenko, E. Simec, M. R. Sikonja, Overcoming the my- (1999) 211–242.
opia of inductive learning algorithms with ReliefF, Appl. Int. 7 [42] J. J. Liu, J. T.-Y. Kwok, An extended genetic rule induction al-
(1997) 39–55. gorithm, in: Proceedings of Congress of Evolutionary Compu-
[24] M. Robnik-Sikonja, I. Kononenko, Theoretical and empirical tation, 2000, pp. 458–463.
analysis of ReliefF and RReliefF, Machine Learning 53 (1-2) [43] K. Trohidis, G. Tsoumakas, G. Kalliris, I. Vlahavas, Multilabel
(2003) 23–69. classification of music into emotions, in: Proceedings 2008 In-
[25] R. Ruiz, J. C. Riquelme, J. S. Aguilar-Ruiz, Heuristic search ternational Conference on Music Information Retrieval, ISMIR
over a ranking for feature selection, in: Proceedings of IWANN 2008, 2008, pp. 325–330.
2005, Vol. LNCS 3512, Springer-Verlag Berlin Heidelberg, [44] S. Dendamrongvit, P. Vateekul, M. Kubat, Irrelevant attributes
2005, pp. 742–749. and imbalanced classes in multi-label text- categorization do-
[26] N. Spolar, E. Cherman, M. Monard, H. Lee, Filter approach fea- mains, Intelligent Data Analysis 15 (6) (2011) 843–859.
ture selection methods to support multi-label learning based on [45] G. Lastra, O. Luaces, J. R. Quevedo, A. Bahamonde, Graphical
ReliefF and Information Gain, in: Proceedings of the Advances feature selection for multilabel classification tasks, in: Proceed-
in Artificial Intelligence - SBIA 2012, LNCS, Springer Berlin ings of the International Conference on Advances in Intelligent
Heidelberg, 2012, pp. 72–81. Data Analysis, 2011, pp. 246–257.
[27] M. Hall, Correlation-based feature selection for discrete and nu- [46] D. Kong, C. Ding, H. Huang, H. Zhao, Multi-label ReliefF and
meric class machine learning, in: Proceedings of the 17th Inter- F-statistic Feature Selections for Image Annotation, in: Pro-
national Conference on Machine Learning, 2000, pp. 359–366. ceedings of Computer Vision and Pattern Recognition (CVPR),
[28] L. Yu, H. Liu, Feature selection for high-dimensional data: a fast 2012, pp. 2352–2359.
correlation- based filter solution, in: Proceedings of the 20th In- [47] N. Spolaor, E. Alvares, M. Carolina, H. Diana, A comparison of
ternational Conference on Machine Learning, ICML-00, 2003, multi-label feature selection methods using the problem trans-
pp. 856–863. formation approach, Electronic Notes in Theoretical Computer
[29] I. Guyon, A. Elisseeff, An introduction to variable and fea- Science 292 (2013) 135–151.
ture selection, Journal of Machine Learning Research 3 (2003) [48] N. Spolaor, E. A. Cherman, M. C. Monard, Using ReliefF for
1157–1182. multi-label feature selection, in: Proceedings of the Conferencia
[30] J. Tang, S. Alelyani, H. Liu, Data Classification: Algorithms Latinoamericana de Informática, Brazil, 2011, pp. 960–975.
and Applications, CRC Press, 2015, Ch. Feature Selection for [49] J. Read, A pruned problem transformation method for multi-
Classification: A Review, pp. 37–64. label classification, in: Proceedings 2008 New Zealand Com-
[31] R. Tibshirani, Regression shrinkage and selection via the puter Science Research Student Conference (NZCSRS 2008),
LASSO, Journal of the Royal Statistical Society (1996) 267– 2008, pp. 143–150.
288. [50] M. Robnik-Sikonja, I. Kononenko, An adaptation of Relief for
[32] H. Zou, The adaptive lasso and its oracle properties, Journal attribute estimation in regression, in: Proceedings of the ICML-
of the American statistical association 101 (476) (2006) 1418– 97, 1997, pp. 296–304.
1429. [51] J. Demsar, Statistical comparisons of classifiers over multiple
[33] M. Yuan, Y. Lin, Model selection and estimation in regression data sets, Journal of Machine Learning Research 7 (2006) 1–30.
with grouped variables, Journal of the Royal Statistical Society [52] S. Garcı́a, F. Herrera, An extension on “Statistical Comparisons
68 (1) (2006) 49–67. of Classifiers over Multiple Data Sets’’ for all pairwise compar-
[34] P. Zhao, B. Yu, On model selection consistency of lasso, The isons, Journal of Machine Learning Research 9 (2008) 2677–
Journal of Machine Learning Research 7 (2006) 2541–2563. 2694.
[35] D. Kong, R. Fujimaki, J. Liu, F. Nie, C. Ding, Exclusive feature [53] S. Garcı́a, A. Fernández, J. Luengo, F. Herrera, Advanced non-
learning on arbitrary structures via l1,2-norm, in: Z. Ghahra- parametric tests for multiple comparisons in the design of ex-
18
periments in computational intelligence and data mining: Ex- [72] N. Spolaor, E. A. Cherman, M. C. Monard, H. D. Lee, ReliefF
perimental analysis of power, Information Sciences 180 (2010) for multi-label feature selection, in: Proceedings of the Interna-
2044–2064. tional Brasilian conference, IEEE, 2013.
[54] J. Derrac, S. Garcı́a, D. Molina, F. Herrera, A practical tutorial [73] M. L. Zhang, J. M. Pea, V. Robles, Feature selection for
on the use of nonparametric statistical tests as a methodology multi-label naive bayes classification, Information Sciences 179
for comparing evolutionary and swarm intelligence algorithms, (2009) 3218–3229.
Swarm and Evolutionary Computation 1 (2011) 3–18. [74] F. B. et al., The 9th annual MLSP competition: New methods for
[55] K. Brinker, J. Furnkranz, E. Hullermeier, A unified model for acoustic classification of multiple simultaneous bird species in
multilabel classification and ranking, in: Proceedings of the 17th a noisy environment, in: Proceedings of the IEEE International
European Conference on Artificial Intelligence, ECAI-06, 2006, Workshop on Machine Learning for Signal Processing (MLSP),
pp. 489–493. 2013.
[56] R. Schapire, Y. Singer, Boostexter: a boosting-based system for [75] E. Correa, A. Plastino, A. Freitas, A Genetic Algorithm for Op-
text categorization, Machine Learning 39 (2000) 135–168. timizing the Label Ordering in Multi-Label Classifier Chains,
[57] S. Godbole, S. Sarawagi, Discriminative methods for multi- in: Proceedings of the ICTAI-2013, 2013.
labeled classification, in: Proceedings of the 8th Pacific- [76] C. Snoek, M. Worring, J. van Gemert, J.-M. Geusebroek,
Asia Conference on Knowledge Discovery and Data Mining, A. Smeulders, The challenge problem for automated detection
PAKDD 2004, 2004, pp. 22–30. of 101 semantic concepts in multimedia, in: Proceedings of
[58] I. Kononenko, M. Robnik-Sikonja, Computational methods of ACM Multimedia, ACM, Santa Barbara, USA, 2006, pp. 421–
Feature Selection, Chapman & Hall/CRC, 2008, Ch. Non- 430.
Myopic feature quality evaluation with (R)ReliefF, pp. 169– [77] A. Elisseeff, J. Weston, A kernel method for multi-labelled clas-
191. sification, Advances in Neural Information Processing Systems
[59] R. Gilad-Bachrach, A. Navot, N. Tishby, Margin based feature 14.
selection- Theory and Algorithms, in: Proceedings of the 21 [78] J. Read, B. Pfahringer, G. Holmes, E. Frank, Classifier chains
International Conference on Machine Learning, 2004, pp. 43– for multi-label classification, in: Proceedings of the 20th Euro-
50. pean Conference on Machine Learning, 2009, pp. 254–269.
[60] Y. Sun, Iterative Relief for feature weighting: Algorithms, The- [79] B. Klimt, Y. Yang, The Enron corpus: A new dataset for email
ories, and Applications, IEEE Transactions on Pattern Analysis classification research, in: Proceedings of the 15th European
and Machine Intelligence 29 (6) (2007) 1035–1051. conference on Machine Learning, 2004, pp. 217–226.
[61] Y. Sun, D.Wu, A RELIEF based feature extraction algorithm, in: [80] A. Srivastava, B. Zane-Ulman, Discovering recurring anomalies
Proceedings of the SIAM International Conference on DataMin- in text reports regarding complex space systems, in: Proceed-
ing, Atlanta,USA, 2008, pp. 188–195. ings of the IEEE Aerospace Conference, 2005, pp. 55–63.
[62] U. Pompe, I. Kononenko, Linear space induction in first order [81] G. Tsoumakas, I. Vlahavas, Random k-labelsets: an ensemble
logic with ReliefF, Mathematical and Statistical Methods in Ar- method for multilabel classification, in: Proceedings of the 18th
tificial Intelligence, Springer Verlag, New Yorkrob. European conference on Machine Learning, 2007, pp. 406–417.
[63] M. Robnik-Sikonja, Experiments with cost-sensitive feature [82] I. Katakis, G. Tsoumakas, I. Vlahavas, Multilabel text classi-
evaluation, in: Proceedings of the European Conference in Ma- fication for automated tag suggestion, in: Proceedings of the
chine Learning, ECML-2003, 2003, pp. 325–336. ECML/PKDD 2008 Discovery Challenge, Antwerp, Belgium,
[64] M. Robnik-Sikonja, K. Vanhoof, Evaluation of ordinal attributes 2008.
at value level, Data Mining and Knowledge Discovery 14 (2007) [83] G. Tsoumakas, E. Spyromitros-Xioufi, J. Vilcek, I. Vlahavas,
225–243. MULAN: A java library for multi-label learning, Journal of Ma-
[65] A. M. Qamar, E. Gaussier, RELIEF Algorithm and Similarity chine Learning Research 12 (2011) 2411–2414.
Learning for k-NN, International Journal of Computer Informa- [84] K. Sechidis, G. Tsoumakas, I. Vlahavas, On the stratification
tion Systems and Industrial Management Applications 4 (2012) of multi-label data, in: Proceedings of the 2011 European
445–458. conference on Machine learning and knowledge discovery in
[66] A. Zafra, M. Pechenizkiy, S. Ventura, ReliefF-MI: An exten- databases. Volume Part III, ECML/PKDD-11, Springer-Verlag,
sion of ReliefF to multiple instance learning, Neurocomputing 2011, pp. 145–158.
75 (2012) 210–218. [85] E. Spyromitros, G. Tsoumakas, I. Vlahavas, An empirical study
[67] I. Slavkov, J. Karcheska, D. Kocev, S. Kalajdziski, S. Dzeroski, of lazy multilabel classification algorithms, in: Proceedings of
Extending ReliefF for Hierarchical Multi-label Classification, the SETN-2008, Vol. 5138 of Lectures Notes in Artifitial Intel-
in: Proceedings of the 2013 European conference on Machine ligence, Springer-Verlag Berlin Heidelberg, 2008, pp. 401–406.
learning and knowledge discovery in databases. ECML/PKDD- [86] Z. Younes, F. Abdallah, T. Denceux, Multi-label classification
14, 2014. algorithm derived from k-nearest neighbor rule with label de-
[68] O. Reyes, C. Morell, S. Ventura, ReliefF-ML: An extension pendencies, in: Proceedings of the 16th Eropean Signal Process-
of ReliefF Algorithm to Multi-label Learning, in: Proceedings ing Conference, Lausanne, Switzerland, 2008, pp. 297–308.
of the CIARP 2013, Vol. 8259 of Part II, Lecture Notes in [87] J. Xu, Multi-label weighted k-nearest neighbor classifier with
Computer Science, Springer-Verlag Berlin Heidelberg, Habana, adaptive weight estimation, in: Proceedings of the ICONIP
Cuba, 2013, pp. 528–535. 2011, Part II, Vol. 7073 of LNCS, Springer Berlin Heidelberg,
[69] J. Read, B. Pfahringer, G. Holmes, Multi-label classification us- 2011, pp. 79–88.
ing ensembles of pruned sets, in: Proceedings of the 8th IEEE [88] I. Slavkov, An evaluation method for feature rankings, Ph.D.
International Conference on Data Mining, 2008, pp. 995–1000. thesis, Josef Stefan International Postgraduade School (2012).
[70] M. L. Zhang, Z. H. Zhou, ML-kNN: A lazy learning approach [89] S. Garcı́a, D. Molina, M. Lozano, F. Herrera, A study on the
to multi-label learning, Pattern Recognition 40 (7) (2007) 2038– use of non-parametric tests for analyzing the evolutionary al-
2048. gorithms’ behaviour: a case study on the CEC-2005 Special
[71] J. Read, Scalable multi-label classification, Ph.D. thesis, Uni- Session on Real Parameter Optimization, Journal of Heuristics,
versity of Waikato, Hamilton, New Zeland (2010). Springer 15 (2009) 617–644.
19
[90] M. Friedman, A comparison of alternative tests of significance
for the problem of m rankings, Annals of Mathematical Statis-
tics 11 (1940) 86–92.
[91] G. Bergmann, G. Hommel, Improvements of general multiple
test procedures for redundant systems of hypotheses, Multiple
Hypotheses Testing, Springer, Berlin (1988) 100–115.
[92] P. B. Nemenyi, Distribution-free multiple-comparisons., Ph.D.
thesis, Pricenton University (1963).
[93] S. P. Wright, Adjusted p-values for simultaneous inference, Bio-
metrics 48 (1992) 1005–1013.
20