摘要
针对现有垃圾书签检测方法在用户概貌信息较少情况下检测性能下降的问题,提出一种融入可信度的集成SVM垃圾书签检测方法.首先基于Bootstrap技术对训练样本进行可重复采样,得到个体SVM的训练子集,然后将SVM的标准输出直接拟合Sigmoid函数得到SVM的后验概率输出,作为类别输出的可信度,并提出一种性能优于投票策略的融入可信度的融合方法对个体SVM的输出结果进行融合.实验结果表明,该方法在用户概貌信息较少的情况下具有较好的检测性能.
The performance of existing methods for bookmark spam detection is decreased when there is less user profile information. An ensemble SVM approach integrated with confidence for detecting bookmark spare is proposed to solve this problem. The Bootstrap technology is firstly used to repeatedly sample the training data so as to get the subset of training samples for individual SVM. Then, sigmoid function is use to transform the standard output of SVM into a posterior probability which is used as the confidence of categories output. Finally, a method integrated with the confidence is proposed to aggregate the output of individual SVM, which is better than voting strategy. The experimental results show that the detection performance of the proposed approach outperforms the existing methods in the case of less user profile information.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2011年第4期591-596,共6页
Pattern Recognition and Artificial Intelligence
基金
国家973重点基础研究发展计划(No.2005CB321902)
河北省自然科学基金项目(No.F2008000877
F2011203219)
教育部科技发展中心网络时代的科技论文快速共享专项研究课题(No.20091333110011)资助
关键词
垃圾书签
垃圾检测
支持向量机
可信度
集成学习
Bookmark Spam, Spam Detection, Support Vector Machine, Confidence, EnsembleLearning