期刊文献+

大数据中基于稀疏投影的在线特征选择算法

Research on Online Feature Selection Algorithm based on Sparse Projection in Big Data
原文传递
导出
摘要 大多数在线学习研究要求访问训练实例的所有属性/特征.这一典型要求在大数据应用中难以满足,因为数据实例的维度可能很高,为了获得完整的属性/特征集而访问所有属性/特征时的成本太高.针对这一问题,首先利用截断技术提出改进的Perceptron算法用于在线特征选择,然后针对该算法错误率较高的缺点,提出一种基于稀疏投影的在线特征选择算法(OFS),并给出了OFS算法误差边界的理论分析.最后基于多种公开数据集的实验结果表明,本文算法的在线平均错误率和时间效率等方面性能要优于著名的批特征选择算法,在大规模应用中具有广阔前景. Most studies of online learning required accessing all the attributes/features of training instances.Such a classical setting was not always appropriate for big data applications when data instances were of high dimensionality or the access to it was expensive to acquire the full set of attributes/features. To solve this problem,firstly,by using the truncation technology,the improved Perceptron algorithm was presented for online feature selection. and then according to the shortcoming of the higher error rate in above algorithm,a feature selection algorithm based on online sparse projection( OFS) was proposed,and theoretical analysis on the mistake bound of the proposed OFS algorithm was offered. Finally,the experimental results based on a variety of public data sets show that,the performance of this algorithm is better than the famous batch feature selection algorithm in terms of the time efficiency and the average rate of mistake,and has wide prospect in large scale application.
作者 张自敏 Zhang Zimin(Center of Education Technology, Hezhou University, Hezhou 542899, China)
出处 《湖南科技大学学报(自然科学版)》 CAS 北大核心 2018年第3期93-101,共9页 Journal of Hunan University of Science And Technology:Natural Science Edition
基金 广西高校科学技术研究资助项目(2013LX143)
关键词 在线学习 截断技术 稀疏投影 特征选择 在线平均错误率 大数据挖掘 online learning truncation technology sparse projection feature selection online average rateof mistake big data mining
  • 相关文献

参考文献2

二级参考文献26

  • 1毛勇,周晓波,夏铮,尹征,孙优贤.特征选择算法研究综述[J].模式识别与人工智能,2007,20(2):211-218. 被引量:95
  • 2Guyon I, Elisseeff A. An introduction to variable and feature selection. The Journal of Machine Learning Research, 2003, 3:1157-1182.
  • 3Guyon I, Weston J, Barnhill S, et al. Gene selection for cancer classification using support vector machines. Machine Learning, 2002, 46(1-3): 389-422.
  • 4Rakotomamonjy A. Variable selection using svm based criteria. The Journal of Machine Learning Research, 2003, 3: 1357- 1370.
  • 5Duan K B, Rajapakse J C, Wang H, et al. Multiple SVM- RFE for gene selection in cancer classification with expression data. IEEE Transactions on NanoBioscience, 2005, 4(3): 228-234.
  • 6Xia H, Hu B Q. Feature selection using fuzzy support vector machines. Fuzzy Optimization and Decision Making, 2006, 5(2): 187-192.
  • 7Zhou X, Tuck D P. MSVM-RFE: Extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics, 2007, 23(9): 1106-1114.
  • 8Maldonado S, Weber R. A wrapper method for feature selection using support vector machines. Information Sciences, 2009, 179(13): 2208-2217.
  • 9Somol P, Novovicova J. Evaluating stability and comparing output of feature selectors that optimize feature subset cardinality. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(11): 1921-1939.
  • 10Tapia E, Bulacio P, Angelone L. Sparse and stable gene selection with consensus SVM-RFE. Pattern Recognition Letters, 2012, 33(2): 164-172.

共引文献63

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部