摘要
在不平衡数据集中,由于少类样本和多类样本的不平衡,在分类过程中容易产生难以分类和错误分类的现象。针对不平衡数据集的分类特点,设计出一种组合分类器,适用于不平衡数据集的分类。通过SMOTE算法采样对不平衡数据集进行一个预处理,采用单层决策树作为基本分类器,利用Matlab编程,构建Adaboost算法分类器,对demo、heart和usps数据集进行训练集和测试集分析。结果表明,通过Adaboost算法可以有效提高分类效果,算法中通过改变正类样本的权值,从而重视对少类样本的分类,在一定程度上能够提高整体的分类效果,实现不平衡数据集的分类设计。
In the unbalanced data set,due to the imbalance between the small-class samples and the multi-class samples,it was easy to cause the phenomenon of difficult classification and misclassification in the classification process.Aiming at the classification characteristics of unbalanced data sets,a combined classifier was designed,which was suitable for the classification of unbalanced data sets.A preprocessing was performed on the unbalanced data set through SMOTE algorithm sampling,decision stump was used as the basic classifier,Matlab programming was used to construct the adaboost algorithm classifier,and the demo,heart and usps data sets were analyzed for the training set and the test set.The results shown that the Adaboost algorithm were able effectively improved the classification effect.In the algorithm,the weight of the positive samples was changed to emphasize the classification of the few samples.Therefore,the overall classification effect can be improved to a certain extent,and classification design for unbalanced data sets was improved.
作者
董庆伟
DONG Qing-wei(School of Information Management,Minnan University of Science and Technology,Shishi 362700,China)
出处
《长春师范大学学报》
2022年第6期49-52,共4页
Journal of Changchun Normal University
基金
福建省本科高校重大教育教学改革研究项目“基于校企合作的信息与计算科学专业课程体系构建与研究”(FBJG20170333)
福建省中青年教师教育课题项目“电子商务对企业人力资源的影响研究”(JB12372S)。