摘要
类分布不均衡问题在现实世界中广泛存在,针对不均衡数据集的分类方法及其性能评估方法,都与传统分类算法大相径庭。本文在分析常用的二类不均衡数据分类策略的基础上,选取了十个公开的KEEL科研数据集,用G-mean值和AUC值分别衡量分类器的准确率和泛化性能。在KEEL平台上对常用的三类策略中的12种方法的性能进行了验证,明确了算法各自的适用情况。
Class distribution imbalance is a widespread problem in the real world.The classification methods and performance evaluation methods for imbalanced data sets are quite different from the traditional classification algorithms.Based on the analysis of the commonly used binary imbalanced data classification strategy,selected ten the public KEEL scientific research data sets,using G-mean value and the AUC value measuring accuracy and generalization performance of the classifier.On KEEL platform,the performance of 12 methods of three commonly used strategies was experimentally verified,made clear the suitable situation of each algorithm respectively.
作者
杨小军
刘志
王力猛
刘文
YANG Xiaojun;LIU Zhi;WANG Limeng;LIU Wen(Joint logistics college,National Defense University,Beijing 100858,China)
出处
《智能计算机与应用》
2020年第11期21-26,共6页
Intelligent Computer and Applications