摘要
不平衡数据的集成学习分类,存在数据子集划分难以覆盖整体数据分布、基分类器权重设置主观性强等问题,鉴于此,提出不平衡数据分类的矩阵粒加权集成分类算法。首先,采用bagging粒化算法划分数据集,形成若干矩阵粒数据子集;然后,应用矩阵距离算法,计算矩阵粒与全一矩阵之间的距离作为集成规则权重;以各矩阵粒为单位构建CART基分类器分别训练;最后,采用矩阵粒距离权重集成规则实现集成分类,以KEEL数据库中不平衡数据集为实验数据,验证了算法的有效性。仿真结果表明,矩阵粒加权集成分类算法具有较高的分类准确性,是对不平衡数据分类算法研究的一次有益尝试和补充。
When using the ensemble learning classification of imbalanced data,in view of the difficulty in covering the overall data distribution and strong subjectivity weight of base classifiers.A matrix-granule weighted ensemble classification algorithm for imbalanced data classification was proposed.Using bagging granulation algorithm,the data set was divided to form a plurality of matrix-granule data subsets;Then,using the matrix distance algorithm to,the distance between the matrix-granule and the all-one matrix as the ensemble rule weight was computed and the CART-based classifier separately for each matrix granule was constructed;Finally,using the matrix granular distance weight ensemble rule,the ensemble classification was realized.Using the imbalanced data set in the KEEL database as experimental data,the effectiveness of the algorithm was verified.The simulation results show that the matrix granular weighted ensemble classification algorithm has higher classification accuracy,and it is a useful attempt and supplement to the research of imbalanced data classification algorithm.
作者
王荣杰
代琪
赵佳亮
陈丽芳
WANG Rong-jie;DAI Qi;ZHAO Jia-liang;CHEN Li-fang(College of Science,North China University of Science and Technology,Tangshan Hebei 063210,China;Department of Automation,China University of Petroleum,Beijing 102249,China)
出处
《华北理工大学学报(自然科学版)》
CAS
2021年第3期125-132,共8页
Journal of North China University of Science and Technology:Natural Science Edition
基金
河北省自然科学基金资助项目(F2014209086)。
关键词
矩阵粒
不平衡数据
权重
决策树
集成学习
matrix-granules
imbalanced data
weight
decision tree
ensemble learning