期刊文献+

二类不均衡数据分类问题常用策略研究 被引量:1

A comparative study of common strategies for binary classification of Imbalanced data
下载PDF
导出
摘要 类分布不均衡问题在现实世界中广泛存在,针对不均衡数据集的分类方法及其性能评估方法,都与传统分类算法大相径庭。本文在分析常用的二类不均衡数据分类策略的基础上,选取了十个公开的KEEL科研数据集,用G-mean值和AUC值分别衡量分类器的准确率和泛化性能。在KEEL平台上对常用的三类策略中的12种方法的性能进行了验证,明确了算法各自的适用情况。 Class distribution imbalance is a widespread problem in the real world.The classification methods and performance evaluation methods for imbalanced data sets are quite different from the traditional classification algorithms.Based on the analysis of the commonly used binary imbalanced data classification strategy,selected ten the public KEEL scientific research data sets,using G-mean value and the AUC value measuring accuracy and generalization performance of the classifier.On KEEL platform,the performance of 12 methods of three commonly used strategies was experimentally verified,made clear the suitable situation of each algorithm respectively.
作者 杨小军 刘志 王力猛 刘文 YANG Xiaojun;LIU Zhi;WANG Limeng;LIU Wen(Joint logistics college,National Defense University,Beijing 100858,China)
出处 《智能计算机与应用》 2020年第11期21-26,共6页 Intelligent Computer and Applications
关键词 二类不均衡数据分类 重采样方法 代价敏感学习算法 集成学习算法 KEEL Binary classification of imbalanced data resample method Cost-sensitive learning method ensemble learning method KEEL
  • 相关文献

参考文献6

二级参考文献166

  • 1蒋盛益,谢照青,余雯.基于代价敏感的朴素贝叶斯不平衡数据分类研究[J].计算机研究与发展,2011,48(S1):387-390. 被引量:21
  • 2张琦,吴斌,王柏.非平衡数据训练方法概述[J].计算机科学,2005,32(10):181-186. 被引量:10
  • 3韩慧,王路,温明,王文渊.不均衡数据集学习中基于初分类的过抽样算法[J].计算机应用,2006,26(8):1894-1897. 被引量:11
  • 4陈斌,冯爱民,陈松灿,李斌.基于单簇聚类的数据描述[J].计算机学报,2007,30(8):1325-1332. 被引量:18
  • 5Chan P K, Stolfo S J. Toward scalable learning with nonuniform class and cost distributions: A case study in credit card fraud detection[C]// Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining. New York, USA: AAAI Press, 1998:164-168.
  • 6Phua C, Alahakoon D, Lee V. Minority report in fraud detection:Classification of skewed data[J]. SIGKDD Explore, 2004,6 (1) :50-59.
  • 7Sun Aixin, Lira E P, Liu Ying. On strategies for imbalaneed text classification using SVM: A comparative study[J]. Decision Support Systems, 2009,48 : 191-201.
  • 8Turney P D. Learning algorithms for keyphrase extraction[J]. Information Retrieval, 2000,2 (4) : 303-336.
  • 9Ling C X, Li C. Data mining for direct marketing: Problems and solutions[C] // Proceeding of the 4th International Conference on Knowledge Discovery and Data Mining. 1998:73-79.
  • 10Bauer E,Kohavi R. An empirical comparison of voting classication algoirthm: Bagging, boosting and variants [J]. Machine Learning, 1999,36 : 105-142.

共引文献320

同被引文献15

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部