摘要
针对Web对象的多标签分类的自动标注过程中,存在的标记数据耗时和不足导致分类性能不高的问题,提出了基于稀疏混合图随机跳跃变迁策略的Web对象多标签分类算法。首先,在构建Web对象亲和子图和标签相关子图基础上,通过权重自适应方式构建Web对象标签分类的混合图,实现半监督形式的自动标注,解决人工标注存在的耗时问题;其次,针对混合图求解问题,利用随机跳跃变迁策略实现混合图对象与预测标签间的概率分配,实现未标记的Web对象所属类别标签的概率估计,并获得其top-k最高相关性分数;最后,在UCI Web测试集和真实大数据上进行测试,结果显示所提算法的Rand指标要优于对比算法,验证了算法的有效性。
In order to solve the problem of time consuming and insufficient for labeling data,which leads the low computational efficiency in multi-label classification of Web objects,this paper proposes a multi-label classification algorithm based on sparse mixed graph random jump transition strategy for Web object.Firstly,based on the construction of the Web object affinity graph and tag correlation,weight adaptive method is used to construct a hybrid graph of Web object label classification,which realizes the automatic annotation of semi-supervised form and solves the time consuming problem of manual annotation;Secondly,in order to solve the problem of mixed graph,the random jump transition strategy is used to get the probability distribution between the mixed graph and the prediction tag,which realizes the probability estimation of the class label of the unlabeled Web object and obtains the highest top-k correlation score;Finally,through the test on UCI Web dataset and real big data,the results show that the Rand index of the proposed algorithm is better than the selected contrast algorithms,which verifies the effectiveness of the proposed algorithm.
作者
汪忠国
吴敏
谭芳芳
WANG Zhongguo;WU Min;TAN Fangfang(Anhui Institute of Information Technology, Wuhu, Anhui 241000, China;School of Software Engineering, University of Science and Technology of China, Hefei 230051, China;Foundation Teaching Department, Anhui Institute of Information Technology,Wuhu, Anhui 241000, China)
出处
《计算机科学与探索》
CSCD
北大核心
2017年第7期1166-1174,共9页
Journal of Frontiers of Computer Science and Technology
基金
安徽省教育厅自然科学研究项目No.KJ2016A075~~