计算机科学 ›› 2020, Vol. 47 ›› Issue (2): 44-50.doi: 10.11896/jsjkx.181202285
WANG Sheng-wu,CHEN Hong-mei
摘要: 随着互联网和物联网技术的发展,数据的收集变得越发容易。但是,高维数据中包含了很多冗余和不相关的特征,直接使用会徒增模型的计算量,甚至会降低模型的表现性能,故很有必要对高维数据进行降维处理。特征选择可以通过减少特征维度来降低计算开销和去除冗余特征,以提高机器学习模型的性能,并保留了数据的原始特征,具有良好的可解释性。特征选择已经成为机器学习领域中重要的数据预处理步骤之一。粗糙集理论是一种可用于特征选择的有效方法,它可以通过去除冗余信息来保留原始特征的特性。然而,由于计算所有的特征子集组合的开销较大,传统的基于粗糙集的特征选择方法很难找到全局最优的特征子集。针对上述问题,文中提出了一种基于粗糙集和改进鲸鱼优化算法的特征选择方法。为避免鲸鱼算法陷入局部优化,文中提出了种群优化和扰动策略的改进鲸鱼算法。该算法首先随机初始化一系列特征子集,然后用基于粗糙集属性依赖度的目标函数来评价各子集的优劣,最后使用改进鲸鱼优化算法,通过不断迭代找到可接受的近似最优特征子集。在UCI数据集上的实验结果表明,当以支持向量机为评价所用的分类器时,文中提出的算法能找到具有较少信息损失的特征子集,且具有较高的分类精度。因此,所提算法在特征选择方面具有一定的优势。
[1]ZHANG D,CHEN S,ZHOU Z.Constraint Score:A new filter method for feature selection with pairwise constraints[J].Pattern Recognition,2008,41(5):1440-1451. [2]SOLORIO-FERNANDEZ S,MARTINEZ-TRINIDAD J F, CARRASCO-OCHOA J A.A new unsupervised spectral feature selection method for mixed data:A Filter Approach[J].Pattern Recognition,2017,72:314-326. [3]LI J D,LIU H.Challenges of feature selection for big data analytics[J].IEEE Intelligent Systems,2017,32(2):9-15. [4]MIAO J Y,NIU L F.A survey on feature selection[J].Procedia Computer Science,2016,91:919-926. [5]CHANDRASHEKAR G,SAHIN F.A survey on feature selection methods[J].Computers and Electrical Engineering,2014,40(1):16-28. [6]LI M,KAMILI M.Research on feature selection methods and algorithms[J].Computer Technology and Development,2013(12):16-21. [7]LEAS S,CANUTO AM D P.Filter-based optimization tech-niques for selection of feature subsets in ensemble systems[J].Expert Systems with Applications,2014,41(4):1622-1631. [8]YANG P,LIU W,ZHOU B B,et al.Ensemble-based wrapper methods for feature selection and class imbalance learning[C]∥Advances in Knowledge Discovery and Data Mining.2013,7818:544-555. [9]HAMED T,DARA R,KREMER S C.An Accurate,fast embedded feature selection for SVMs[C]∥Proceedings of the 2015 International Conference on Machine Learning and Applications.Piscataway,NJ:IEEE,2015:135-140. [10]PAWLAK Z.Rough sets[J].International Journal of Computer and Information Science,1982,11(5):341-356. [11]YU Y,PEDRYCZ W,Miao D.Neighborhood rough sets based multi-label classification for automatic image annotation[C]∥Proceedings of the 2013 Ifsa World Congress and Nafips Meeting.Piscataway,NJ:IEEE,2013:1373-1387. [12]WANG C,SHAO M,He Q,et al.Feature subset selection based on fuzzy neighborhood rough sets[J].Knowledge-Based Systems,2016,111:173-179. [13]ZHOU J,PEDRYCZ W,Miao D.Shadowed sets in the characterization of rough-fuzzy clustering[J].Pattern Recognition,2011,44(8):1738-1749. [14]BANERJEE A,MAJI P.Rough sets and stomped normal distribution for simultaneous segmentation and bias field correction in brain MR images[J].IEEE Transactions on Image Process,2015,24(12):5764-5776. [15]ALBANESE A,PAL S K,PETROSINO A.Rough sets,kernel set,and spatiotemporal outlier detection[J].IEEE Transactions on Knowledge & Data Engineering,2013,26(1):194-207. [16]ZHOU B,CHEN L,JIA X.Information retrieval using rough set approximations[M]∥ICTs and the Millennium Development Goals.Springer US,2014:185-197. [17]HU Q H,ZHAO H,YU R D.Efficient symbolic and numerical attribute reduction with neighborhood rough sets[J].Pattern Recognition and Artificial Intelligence,2008,21(6):730-738. [18]SKOWRON A,RAUSZER C.The discernibility matrices and functions in information systems[C]∥Proceedings of the 1991 Intelligent Decision Support-handbook of Applications and Advances of the Rough Sets theory.Dordrecht:Kluwer Academic Publisher,1991:331-362. [19]VIEGAS F,ROCHA L,GONÇALVES M,et al.A Genetic Programming approach for feature selection in highly dimensional skewed data[J].Neurocomputing,2018,273:554-569. [20]OH I S,LEE J S,MOON B R.Hybrid genetic algorithms for feature selection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2004,26(11):1424-1437. [21]MOAYEDIKIA A,ONG K,BOO Y L,et al.Feature selection for high dimensional imbalanced class data using harmony search[J].Engineering Applications of Artificial Intelligence,2017,57:38-49. [22]MITIC M,VUKOVIC N,PETROVIC M,et al.Chaotic fruit fly optimization algorithm[J].Knowledge-Based Systems,2015,89(C):446-458. [23]CHEN Y M,MIAO D Q,WANG R Z.A rough set approach to feature selection based on ant colony optimization[J].Pattern Recognition Letters,2010,31(3):226-233. [24]XUE B,ZHANG M,BROWNE W N.Particle swarm optimization for feature selection in classification:a multi-objective approach[J].IEEE Transactions on Cybernetics,2013,43(6):1656-1671. [25]WANG X,YANG J,TENG X,et al.Feature selection based on rough sets and particle swarm optimization[J].Pattern Recognition Letters,2007,28(4):459-471. [26]WANG L,QIU T R,HE N,et al.A method for feature selection based on rough sets and ant colonyoptimization algorithm[J].Journal of Nanjing University(Natural Sciences),2010,46(5):487-493. [27]CHEN Y,ZHU Q,XU H.Finding rough set reducts with fish swarm algorithm[J].Knowledge-Based Systems,2015,81(C):22-29. [28]MIRJALILI S,LEWIS A.The Whale optimization algorithm.[J].Advances in Engineering Software,2016,95:51-67. [29]WAIKATO M L G.Weka 3:Data Mining Software in Java [EB/OL].[2018-07-10].http://www.cs.waikato.ac.nz/ml/weka/. |
[1] | 李斌, 万源. 基于相似度矩阵学习和矩阵校正的无监督多视角特征选择 Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment 计算机科学, 2022, 49(8): 86-96. https://doi.org/10.11896/jsjkx.210700124 |
[2] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[3] | 康雁, 王海宁, 陶柳, 杨海潇, 杨学昆, 王飞, 李浩. 混合改进的花授粉算法与灰狼算法用于特征选择 Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection 计算机科学, 2022, 49(6A): 125-132. https://doi.org/10.11896/jsjkx.210600135 |
[4] | 储安琪, 丁志军. 基于灰狼优化算法的信用评估样本均衡化与特征选择同步处理 Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation 计算机科学, 2022, 49(4): 134-139. https://doi.org/10.11896/jsjkx.210300075 |
[5] | 孙林, 黄苗苗, 徐久成. 基于邻域粗糙集和Relief的弱标记特征选择方法 Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief 计算机科学, 2022, 49(4): 152-160. https://doi.org/10.11896/jsjkx.210300094 |
[6] | 李宗然, 陈秀宏, 陆赟, 邵政毅. 鲁棒联合稀疏不相关回归 Robust Joint Sparse Uncorrelated Regression 计算机科学, 2022, 49(2): 191-197. https://doi.org/10.11896/jsjkx.210300034 |
[7] | 张叶, 李志华, 王长杰. 基于核密度估计的轻量级物联网异常流量检测方法 Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method 计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108 |
[8] | 杨蕾, 降爱莲, 强彦. 基于自编码器和流形正则的结构保持无监督特征选择 Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization 计算机科学, 2021, 48(8): 53-59. https://doi.org/10.11896/jsjkx.200700211 |
[9] | 侯春萍, 赵春月, 王致芃. 基于自反馈最优子类挖掘的视频异常检测算法 Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining 计算机科学, 2021, 48(7): 199-205. https://doi.org/10.11896/jsjkx.200800146 |
[10] | 胡艳梅, 杨波, 多滨. 基于网络结构的正则化逻辑回归 Logistic Regression with Regularization Based on Network Structure 计算机科学, 2021, 48(7): 281-291. https://doi.org/10.11896/jsjkx.201100106 |
[11] | 周钢, 郭福亮. 基于特征选择的高维数据集成学习方法研究 Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data 计算机科学, 2021, 48(6A): 250-254. https://doi.org/10.11896/jsjkx.200700102 |
[12] | 丁思凡, 王锋, 魏巍. 一种基于标签相关度的Relief特征选择算法 Relief Feature Selection Algorithm Based on Label Correlation 计算机科学, 2021, 48(4): 91-96. https://doi.org/10.11896/jsjkx.200800025 |
[13] | 滕俊元, 高猛, 郑小萌, 江云松. 噪声可容忍的软件缺陷预测特征选择方法 Noise Tolerable Feature Selection Method for Software Defect Prediction 计算机科学, 2021, 48(12): 131-139. https://doi.org/10.11896/jsjkx.201000168 |
[14] | 张亚钏, 李浩, 宋晨明, 卜荣景, 王海宁, 康雁. 混合人工化学反应优化和狼群算法的特征选择 Hybrid Artificial Chemical Reaction Optimization with Wolf Colony Algorithm for Feature Selection 计算机科学, 2021, 48(11A): 93-101. https://doi.org/10.11896/jsjkx.210100067 |
[15] | 董明刚, 黄宇扬, 敬超. 基于遗传实例和特征选择的K近邻训练集优化方法 K-Nearest Neighbor Classification Training Set Optimization Method Based on Genetic Instance and Feature Selection 计算机科学, 2020, 47(8): 178-184. https://doi.org/10.11896/jsjkx.190700089 |