计算机科学 ›› 2021, Vol. 48 ›› Issue (4): 91-96.doi: 10.11896/jsjkx.200800025
丁思凡, 王锋, 魏巍
DING Si-fan, WANG Feng, WEI Wei
摘要: 特征选择在机器学习和数据挖掘中起到了至关重要的作用。Relief作为一种高效的过滤式特征选择算法,能处理多种类型的数据,且对噪声的容忍力较强,因此被广泛应用。然而,经典的Relief算法对离散特征的评价较为简单,在实际进行特征选择时并未充分挖掘特征与类标签之间的潜在关系,具有很大的改进空间。针对经典的Relief算法对离散特征的评价方式较为简单这一不足,提出了一种基于标签相关度的离散特征评价方法。该算法充分考虑了不同特征的特性,给出了一种面向混合特征的距离度量方式,同时从离散特征与标签之间的相关度出发,重新定义了Relief算法对离散特征的评价体系。实验结果表明,改进后的Relief算法与经典的Relief算法和现有的一些面向混合数据的特征选择算法相比,其分类精度均有不同程度的提升,具有良好的性能。
中图分类号:
[1]LIU H,YU L.Toward integrating feature selection algorithmsfor classification and clustering[J].IEEE Transactions on Knowledge and Data Engineering,2005,17(4):491-502. [2]WANG S,LI T R,LUO C,et al.Domain-wise approaches for updating approximations with multi-dimensional variation of ordered information systems[J].Information Sciences,2019,478:100-124. [3]ZENG A P,LI T R,HU J,et al.Dynamical updating fuzzy rough approximations for hybrid data under the variation of attribute values[J].Information Sciences,2017,378:363-388. [4]DASH M,CHOI K,SCHEUERMANN P,et al.Feature selection for clustering - a filter solution[C]//2002 IEEE International Conference on Data Mining.Maebashi City,Japan,2002:115-122. [5]ZHU Z,ONG Y,DASH M.Wrapper-Filter Feature SelectionAlgorithm Using a Memetic Framework[J].IEEE Transactions on Systems,Man,and Cybernetics,Part B (Cybernetics),2007,37(1):70-76. [6]LIU Y F,YE D Y,LI W B,et al.Robust neighborhood embedding for unsupervised feature selection[J].Knowledge-Based Systems,2020,193:105462. [7]HUANG D,CHOW T W S.Effective feature selection scheme using mutual information[J].Neurocomputing,2005,63(Jan):325-343. [8]XU J L,ZHOU Y M,CHEN L,et al.Unsupervised feature selection based on mutual information[J].Journal of Computer Research and Development,2012,49(2):372-382. [9]HUANG X J.Research on Relief Algorithm for Feature Selection[D].Suzhou:Suzhou University,2018. [10]WANG F,LIANG J Y,QIAN Y H.Attribute reduction:a dimension incremental strategy[J].Knowledge-Based Systems,2013,39(2):95-108. [11]LIANG J Y,WANG F,DANG C Y,et al.A group incremental approach to feature selection applying rough set technique[J].IEEE Transactions on Knowledge and Data Engineering,2012,26(2):294-308. [12]ISLAM M J,WU Q M J,AHMADI M,et al.Investigating the Performance of Naive-Bayes Classifiers and K-Nearest Neighbor Classifiers[J].Journal of Convergence Information Technology,2010,5(2):133-137. [13]WANG G C.Research and Application of Naive Bayes Classifier[D].Chongqing:Chongqing Jiaotong University,2010. [14]SAFAVIAN S R,LANDGREBE D.A survey of decision tree classifier methodology[J].IEEE Transactions on Systems,Man,and Cybernetics,1991,21(3):660-674. [15]ZHOU X,TUCK D P.MSVM-RFE:extensions of SVM-RFE for multiclass gene selection on DNA microarray data[J].Bioinformatics,2007,23(9):1106-1114. [16]KIRA K,RENDELL L A.The feature selection problem:Traditional methods and a new algorithm[C]//AAAI.1992:129-134. [17]KONONENKO I.Estimating attributes:analysis and extensions of Relief[C]//Maching Learning:ECML-94.1994:171-182. [18]SUN Y.Iterative RELIEF for Feature Weighting:Algorithms,Theories,and Applications[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2007,29(6):1035-1051. [19]DRAPER B,KAITO C,BINS J.Iterative Relief[C]//2003 Conference on Computer Vision and Pattern Recognition Workshop,Madison,Wisconsin.USA,2003:62-62. [20]GREENE C S,PENROD N M,KIRALIS J,et al.Spatially uniform relieff (SURF) for computationally-efficient filtering of gene-gene interactions[J].BioData Mining,2009,2(1):5. [21]URBANOWICZ R J,MEEKER M,LA CAVA W,et al.Relief-based feature selection:Introduction and review[J].Journal of biomedical informatics,2018,85:189-203. [22]KONONENKO I,ŠIMEC E,ROBNIK-ŠIKONJA M.Overco-ming the myopia of inductive learning algorithms with RELIEFF[J].Applied Intelligence,1997,7(1):39-55. [23]TODOROV A.Statistical Approaches to Gene X Environment Interactions for Complex Phenotypes[M].MIT Press,2016:95-116. [24]KONONENKO I,ŠIKONJA M R.Non-myopic feature quality evaluation with (R) ReliefF[J].Computational methods of feature selection,2008,7(10):169-191. [25]HONG S J.Use of contextual information for feature ranking and discretization[J].IEEE transactions on knowledge and data engineering,1997,9(5):718-730. [26]ROBNIK-ŠIKONJA M,KONONENKO I.Theoretical and empirical analysis of ReliefF and RReliefF[J].Machine learning,2003,53(1/2):23-69. [27]WANG J,WANG S T.Double exponential fuzzy C-means algorithm based on mixed distance learning [J].Journal of Software,2010,21(8):1878-1888. [28]LI H L,GUO C H.Review of feature representation and similarity measurement in time series data mining[J].Computer Application Research,2013,30(5):1285-1291. [29]XIE M X,GUO J Z,ZHANG H B,et al.Research on similarity measurement method of high-dimensional data[J].Computer Engineering and Science,2010,32(5):92-96. [30]LIU J,JIN D,DU H J,et al.A new hybrid feature selection method RRK[J].Journal of Jilin University (Engineering Science Edition),2009,39(2):419-423. [31]ZHANG L X,WANG J Y,ZHAO Y N,et al.Combined feature selection based on Relief[J].Fudan Journal (Natural Science Edition),2004(5):893-898. [32]WANG J,CI L L,YAO K Z.A Summary of Feature Selection Methods[J].Computer Engineering and Science,2005(12):72-75. [33]DING X M,WANG H J,WANG Y G,et al.Unsupervised feature selection method based on improved ReliefF[J].Application of Computer Systems,2018,27(3):149-155. [34]STANFILL C,WALTZ D.Toward memory-based reasoning[J].Communications of the ACM,1986,29(12):1213-1228. [35]WANG F,LIANG J Y.An efficient feature selection algorithm for hybrid data[J].Neurocomputing,2016,193:33-41. [36]BHARGAVA N,SHARMA G,BHARGAVA R,et al.Decision tree analysis on j48 algorithm for data mining[J].Proceedings of International Journal of Advanced Research in Computer Science and Software Engineering,2013,3(6):1114-1119. |
[1] | 李斌, 万源. 基于相似度矩阵学习和矩阵校正的无监督多视角特征选择 Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment 计算机科学, 2022, 49(8): 86-96. https://doi.org/10.11896/jsjkx.210700124 |
[2] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[3] | 康雁, 王海宁, 陶柳, 杨海潇, 杨学昆, 王飞, 李浩. 混合改进的花授粉算法与灰狼算法用于特征选择 Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection 计算机科学, 2022, 49(6A): 125-132. https://doi.org/10.11896/jsjkx.210600135 |
[4] | 储安琪, 丁志军. 基于灰狼优化算法的信用评估样本均衡化与特征选择同步处理 Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation 计算机科学, 2022, 49(4): 134-139. https://doi.org/10.11896/jsjkx.210300075 |
[5] | 孙林, 黄苗苗, 徐久成. 基于邻域粗糙集和Relief的弱标记特征选择方法 Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief 计算机科学, 2022, 49(4): 152-160. https://doi.org/10.11896/jsjkx.210300094 |
[6] | 李宗然, 陈秀宏, 陆赟, 邵政毅. 鲁棒联合稀疏不相关回归 Robust Joint Sparse Uncorrelated Regression 计算机科学, 2022, 49(2): 191-197. https://doi.org/10.11896/jsjkx.210300034 |
[7] | 任首朋, 李劲, 王静茹, 岳昆. 基于集成回归决策树的lncRNA-疾病关联预测方法 Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction 计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132 |
[8] | 刘振宇, 宋晓莹. 一种可用于分类型属性数据的多变量回归森林 Multivariate Regression Forest for Categorical Attribute Data 计算机科学, 2022, 49(1): 108-114. https://doi.org/10.11896/jsjkx.201200189 |
[9] | 毋琳, 白澜, 孙梦伟, 郭拯危. 基于特征优化的SAR图像水华识别方法 Algal Bloom Discrimination Method Using SAR Image Based on Feature Optimization Algorithm 计算机科学, 2021, 48(9): 194-199. https://doi.org/10.11896/jsjkx.200800142 |
[10] | 张叶, 李志华, 王长杰. 基于核密度估计的轻量级物联网异常流量检测方法 Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method 计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108 |
[11] | 杨蕾, 降爱莲, 强彦. 基于自编码器和流形正则的结构保持无监督特征选择 Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization 计算机科学, 2021, 48(8): 53-59. https://doi.org/10.11896/jsjkx.200700211 |
[12] | 侯春萍, 赵春月, 王致芃. 基于自反馈最优子类挖掘的视频异常检测算法 Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining 计算机科学, 2021, 48(7): 199-205. https://doi.org/10.11896/jsjkx.200800146 |
[13] | 胡艳梅, 杨波, 多滨. 基于网络结构的正则化逻辑回归 Logistic Regression with Regularization Based on Network Structure 计算机科学, 2021, 48(7): 281-291. https://doi.org/10.11896/jsjkx.201100106 |
[14] | 周钢, 郭福亮. 基于特征选择的高维数据集成学习方法研究 Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data 计算机科学, 2021, 48(6A): 250-254. https://doi.org/10.11896/jsjkx.200700102 |
[15] | 曹扬晨, 朱国胜, 祁小云, 邹洁. 基于随机森林的入侵检测分类研究 Research on Intrusion Detection Classification Based on Random Forest 计算机科学, 2021, 48(6A): 459-463. https://doi.org/10.11896/jsjkx.200600161 |
Viewed | ||||||||||||||||||||||||||||||||||||||||||||||
Full text 26
|
|
|||||||||||||||||||||||||||||||||||||||||||||
Abstract 451
|
|
|||||||||||||||||||||||||||||||||||||||||||||
Cited |
|
|||||||||||||||||||||||||||||||||||||||||||||
Shared | ||||||||||||||||||||||||||||||||||||||||||||||
Discussed |
|