摘要
【目的】针对协同训练算法不能直接应用于单视图数据,且在迭代过程中加入的无标记样本隐含有用信息不够的问题,提出基于核均值漂移聚类的改进局部协同训练算法。【方法】该算法先在有标记样本集中利用改进局部协同训练算法训练一个完整视图分类器h1,同时挑选出价值高的特征子集来训练局部视图分类器h2,然后在无标记样本集中采用核均值漂移算法选择聚类过程中指定带宽范围内的样本,交由分类器h2标记类别后再加入分类器h1的训练中,以此来优化分类模型。【结果】在UCI数据集上的3组对比实验证明了该算法的有效性,实验结果表明该算法具有更高的模型评价能力。【结论】改进局部协同训练算法将数据集划分为局部视图和完整视图,解决了单视图数据的视图划分问题。利用核均值漂移算法选出较好表现数据空间结构的无标记样本,降低了无标记样本带来的误差。
[Purposes]When the co-training algorithm is applied to single view data,it is usually confronted with view partitioning problem.Before the iteration ends,the continuously injected unlabeled data sometimes don’t imply abundant information.For solving the above problems,the improved partial co-training algorithm based on kernel mean shift is proposed.[Methods]Firstly,a full view classifier h1 is trained with labeled datasets by improved partial co-training algorithm,and a more valuable subset of the data is selected from the labeled ones for training apartial view classifier h2.Then,the kernel mean shift is utilized to select data within a given bandwidth in each clustering process from unlabeled datasets.After the selected unlabeled data are labeled by using classifier h2,they are added to the training process of classifier h1 to optimize the classification model.[Findings]The algorithm is validated by comparisons with three control experiments on UCI data,and experimental results show that the algorithm has higher model evaluation ability.[Conclusions]The improved partial co-training algorithm can divide the datasets into partial view and complete view,which solves the view partitioning problem of single view data.Using the kernel mean shift can choose the unlabeled data that represent better performance of the space structure of data,therefore reducing the errors caused by the unlabeled data.
作者
鲜焱
吕佳
XIAN Yan;Lü Jia(College of Computer and Information Sciences,Chongqing Normal University;Chongqing Center of Engineering Technology Research on Digital Agriculture Service,Chongqing Normal University,Chongqing 401331,China)
出处
《重庆师范大学学报(自然科学版)》
CAS
北大核心
2020年第4期106-113,共8页
Journal of Chongqing Normal University:Natural Science
基金
国家自然科学基金(No.1971084)
重庆师范大学科研项目(No.YKC19018)。
关键词
协同训练
均值漂移
流行正则化
特征选择
视图划分
co-training
mean shift
manifold regularization
feature selection
view partition