期刊文献+

不均衡问题中的特征选择新算法:Im-IG 被引量:9

Im-IG:a novel feature selection method for imbalanced problems
原文传递
导出
摘要 机器学习中各类别样本数目不等是普遍存在且备受关注的不均衡问题。广泛用于特征选择的信息增益IG(information gain)算法,在这类不均衡问题中的表现却极少被研究。本文在讨论IG算法在不同均衡度数据集上性能的基础上,提出了一种新的解决不均衡问题的特征选择算法Im-IG(imbalanced-information gain)。Im-IG通过提高小类分布在信息熵计算中的权重,优先选入有利于小类正确分离的特征。在提升整体分类性能的同时,着眼于提高小类的正确率。在多个不均衡数据集上的实验结果表明,Im-IG算法能较好地解决IG算法在不均衡问题中的不适应性,是一种有效的不均衡问题特征选择算法。 An imbalanced data set is an ubiquitous problem in the machine learning field,which attracts much attention from related scientists.The information gain(IG)method is widely used in feature selection,but it is seldom studied in imbalanced problem.Based on the performance discussion of IG on imbalanced data sets,a new method Im-IG was proposed for the imbalanced problem in feature selection.Im-IG increased the weight of minor class in the entropy calculation,in order to select features which were better for minor class.Im-IG focused on improving the classification accuracy of minor class,based on the performance improvement of the whole data set.Experimental results on several imbalanced data sets showed that Im-IG can solve the imbalanced predicament IG met and it was an effective feature selection method for the imbalanced problem.
出处 《山东大学学报(工学版)》 CAS 北大核心 2010年第5期123-128,共6页 Journal of Shandong University(Engineering Science)
基金 国家自然科学基金资助项目(60873129 30901897) 上海市青年科技启明星计划资助项目(08QA1403200)
关键词 Im-IG算法 不均衡问题 特征选择 Im-IG method imbalance problem feature selection
  • 相关文献

参考文献15

  • 1KUBAT M, HOLTE R C, MATWIN S. Machine learning for the detection of oil spills in satellite radar images [ J ]. Machine Learning, 1998 (30) : 195-215.
  • 2PHUA C, ALAHAKOON D. Minority report in fraud detection: classification of skewed data [ J ]. ACM SIGKDD Explorations Newsletter, 2004 (6) :50-59.
  • 3PEREZ J M, MUGUERZA J, ARBELAITZ O. Consolidated tree classifier learning in a car insurance fraud detection domain with class imbalance pattern recognition and data mining[M]. Berlin:Springer Press, 2005:381-389.
  • 4CASTILLO M D, SERRANO J I. A multistrategy approach for digital text categorization from imbalanced documents [ J ]. ACM SIGKDD Explorations Newsletter, 2004 (6) :70-79.
  • 5ZHENG Zhaohui, WU X, SRIHARI R K. Feature selection for text categorization on imbalanced data [ J ]. ACM SIGKDD Explorations Newsletter, 2004 (6) : 80-89.
  • 6CHEN Jianxun, CHENG T H, CHAN A L F. An application of classification analysis for skewed class distribution in therapeutic drug monitoring-the case of vancomycin [C]//Proceedings of the IDEAS Workshop on Medical Information Systems: The Digital Hospital. Beijing, China: IEEE Press, 2004:35-39.
  • 7YOON K, KWEK S. An unsupervised learning approach to resolving the data imbalanced issue in supervised learning problems in functional genomics [ J ]. Neural Comput & Applic, 2007 (16) :295-306.
  • 8RADIVOJAC P, KORAD U, SIVALINGAM K M.Learning from class-imbalanced data in wireless sensor networks [ C]//2003 IEEE 58^th Vehicular Technology Conference. Orlando, Florida, USA: IEEE Press, 2003 : 3030-3034.
  • 9徐燕,李锦涛,王斌,孙春明.基于区分类别能力的高性能特征选择方法[J].软件学报,2008(1):82-89. 被引量:83
  • 10单松巍,冯是聪,李晓明.几种典型特征选取方法在中文网页分类上的效果比较[J].计算机工程与应用,2003,39(22):146-148. 被引量:76

二级参考文献95

共引文献234

同被引文献112

引证文献9

二级引证文献90

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部