摘要
针对目标域训练样本数量较少无法建立优质分类模型的问题,提出一种在迁移框架下基于集成bagging算法的跨领域分类方法。引入源域的数据并对其进行筛选,对混合数据集进行学习,建立基于集成bagging算法的分类模型,投票得出预测结果。仿真对比结果表明,采用基于贝叶斯个体分类器的集成bagging算法能够优化源域的迁移,提升目标域的分类准确率及泛化性能。分析源域的噪音数据数量,其结果表明,该算法可以部分规避负迁移。
The high-quality classification model can not be built due to the problem of the deletion of target training texts,and a cross-cutting classification method based on an integrated bagging algorithm was proposed under the transfer framework.Source data were selected and mixed data sets were studied for establishing the model based on the integrated bagging algorithm,and final results were predicted through voting.Comparing experimental results,it shows that integrated bagging algorithm based on learner of Bayesian can obtain the best migration result,higher classification accuracy and better generalization performance between the source and target domains.The analysis of the number of noisy source data shows that negative transfer can be partially prevented.
出处
《计算机工程与设计》
北大核心
2015年第7期1808-1812,共5页
Computer Engineering and Design
关键词
文本分类
选择
迁移学习
集成bagging算法
负迁移
text classification
selected
transfer learning
integrated bagging algorithm
negative transfer