期刊文献+

改进Smote算法在不平衡数据集上的分类研究 被引量:9

Research on Classification of Improved Smote Algorithm on Imbalanced Datasets
下载PDF
导出
摘要 在不平衡数据集中,过抽样算法如Smote(Synthetic Minority Oversampling)算法、R-Smote算法与SD-ISmote算法可能会模糊多数类与少数类的边界以及使用噪声数据合成新样本。本文提出的ImprovedSmote算法使用少数数据集的簇心与其对应类别的少数集数据,在簇心与不大于样本属性数的对应类别少数集数据形成的图形内随机插值来生成新数据。ImprovedSmote算法结合C4.5决策树与神经网络算法在实验数据集上的结果比Smote,R-Smote与SD-ISmote算法更好,可以有效地提高分类器分类性能。 In imbalanced datasets,the oversampling algorithm,such as Smote(Synthetic Minority Oversampling)algorithm,R-Smote algorithm and SD-ISmote algorithm,may blur the boundary between the majority and the minority and use noisy data to synthesize new samples.The ImprovedSmote algorithm proposed in this paper uses cluster center of minority set and their corresponding minority set to generate new samples.The Smote,the R-Smote,the SD-ISmote and the ImprovedSmote algorithm combined C4.5decision tree and neural network algorithm are used on the experimental datasets.The results show that the ImprovedSmote algorithm is better than other algorithms in classification and can effectively improve classifier performance.
作者 易未 毛力 孙俊 吴林海 YI Wei;MAO Li;SUN Jun;WU Lin-hai(School of Internet of Things, Jiangnan University, Wuxi 214122, China;School of Business, Jiangnan University, Wuxi 214122, China;Food Safety Risk Management Institute, Jiangnan University, Wuxi 214122, China)
出处 《计算机与现代化》 2018年第3期83-88,共6页 Computer and Modernization
基金 国家粮食公益性行业科研专项项目(201513004-6) "十二五"农村领域国家科技计划子课题(2015BAD17B02-8) 现代农业产业技术体系专项资金项目(CARS-49) 江苏省产学研合作项目(BY2015019-30)
关键词 不平衡数据集 Smote算法 R-Smote算法 SD-ISmote算法 ImprovedSmote算法 簇心 imbalanced dataset Smote R-Smote SD-ISmote ImprovedSmote cluster center
  • 相关文献

参考文献3

二级参考文献22

  • 1方敏.集成学习的多分类器动态融合方法研究[J].系统工程与电子技术,2006,28(11):1759-1761. 被引量:12
  • 2韩慧,王文渊,毛炳寰.不均衡数据集中基于Adaboost的过抽样算法[J].计算机工程,2007,33(10):207-209. 被引量:13
  • 3Paolo S.A multi-objective optimisation approach for class im- balance learning[J].Pattem Recognition, 2011,44 ( 8 ) : 1801-1810.
  • 4Han Hui, Wang Wen-yuan, Mao Bing-huan.Borderline-SMOTE: a new over-sampling method in imbalanced data sets learn- ing[C]//Proc of International Conference on Intelligent Com- puting( ICIC' 05 ).Hefei : [s.n.], 2005 : 878-887.
  • 5Jason V H, Taghi K.Knowledge discovery from imbalanced and noisy data[J].Data Knowledge Engineering, 2009,68: 1513-1542.
  • 6Chawla N, Bowyer K, Hall L, et aI.SMOTE : synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16( 1 ) :321-357.
  • 7Yen Show-jane, Lee Yue-shi.Cluster-based under-sampling app- roaches for imbalanced data distributions[J].Expert Systems with Applications, 2009,36 : 5718-5727.
  • 8Frank A, Asuncion A.UCI machine learning repository[EB/ OL].[2011-07-10].http ://archive.ics.uci.edu/ml.
  • 9李凯,崔丽娟.集成学习算法的差异性及性能比较[J].计算机工程,2008,34(6):35-37. 被引量:22
  • 10杨智明,乔立岩,彭喜元.基于改进SMOTE的不平衡数据挖掘方法研究[J].电子学报,2007,35(B12):22-26. 被引量:31

共引文献39

同被引文献52

引证文献9

二级引证文献35

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部