摘要
合成少数类过采样技术(SMOTE)是解决类不平衡问题的有效方法之一。但是,SMOTE的线性插值机制将合成样本限制在原始样本的连线上,导致新样本缺乏多样性,并且这条连线穿过多数类区域时可能会生成噪声样本。针对上述问题,提出一种带有超长方体约束的少数类样本生成机制。该机制使用超长方体作为新样本的生成区域来代替线性插值,以增加合成样本与原始样本的差异性。并通过检测超长方体内是否存在多数类样本来决定是否修正此超长方体,从而防止新合成样本落入多数类区域内。使用所提机制替换线性插值,并集成在SMOTE、Borderline-SMOTE和ADASYN三种过采样方法中,然后在KEEL的11个标准数据集上进行了实验评估。结果表明,相比于原始方法,集成后的方法能够帮助分类器取得更高的F_(1)值和相当的G-mean。这说明超长方体生成机制能够显著改善分类器对少数类样本的识别能力,并且能够兼顾到多数类样本。
Synthetic minority oversampling technology(SMOTE)is one of the effective methods to solve the class-imbalanced problem.However,the linear interpolation mechanism of SMOTE restricts the synthesized samples to the connecting line of the original samples,resulting in a lack of diversity for new samples,and may generate noisy samples when this line passes through the majority class region.In response to the above issues,this paper proposed a generation mechanism for minority samples with hypercuboid constraints.This mechanism constructed a hypercuboid as the generation region of new samples instead of linear interpolation,thereby increasing the variability between the synthesized samples and the original samples.Then,it detected whether there were majority samples in the hypercuboid to determine whether to adjust the hypercuboid,which aimed at preventing the new samples into the region of the majority class.This paper integrated the proposed mechanism into three oversampling methods,i.e.,SMOTE,Borderline-SMOTE and ADASYN,by using it to replace linear interpolation,and then experimentally evaluated the integrated method on 11 benchmark datasets from KEEL.The results show that compared to the original method,the integrated method can help the classifier to obtain higher F_(1) and comparable G-mean.It verifies that the hypercuboid generation mechanism can significantly improve the classifier’s ability to recognize minority samples,and meanwhile the majority samples are also taken into account.
作者
贺作伟
陶佳晴
冷强奎
翟军昌
孟祥福
He Zuowei;Tao Jiaqing;Leng Qiangkui;Zhai Junchang;Meng Xiangfu(College of Information Science&Technology,Bohai University,Jinzhou Liaoning 121013,China;School of Electronics&Information Engineering,Liaoning Technical University,Huludao Liaoning 125105,China)
出处
《计算机应用研究》
CSCD
北大核心
2022年第10期3055-3060,共6页
Application Research of Computers
基金
国家自然科学基金资助项目(61602056、61772249)
辽宁省自然科学基金资助项目(2019-ZD-0493)
辽宁省教育厅科研项目(LQ2019012)。
关键词
不平衡分类
过采样技术
SMOTE
生成机制
超长方体约束
imbalanced classification
oversampling technique
SMOTE
generation mechanism
hypercuboid constraints