一种融入可信度的集成SVM垃圾书签检测方法

An Ensemble SVM Approach Integrated with Confidence for Detecting Bookmark Spam

导出

摘要针对现有垃圾书签检测方法在用户概貌信息较少情况下检测性能下降的问题,提出一种融入可信度的集成SVM垃圾书签检测方法.首先基于Bootstrap技术对训练样本进行可重复采样,得到个体SVM的训练子集,然后将SVM的标准输出直接拟合Sigmoid函数得到SVM的后验概率输出,作为类别输出的可信度,并提出一种性能优于投票策略的融入可信度的融合方法对个体SVM的输出结果进行融合.实验结果表明,该方法在用户概貌信息较少的情况下具有较好的检测性能. The performance of existing methods for bookmark spam detection is decreased when there is less user profile information. An ensemble SVM approach integrated with confidence for detecting bookmark spare is proposed to solve this problem. The Bootstrap technology is firstly used to repeatedly sample the training data so as to get the subset of training samples for individual SVM. Then, sigmoid function is use to transform the standard output of SVM into a posterior probability which is used as the confidence of categories output. Finally, a method integrated with the confidence is proposed to aggregate the output of individual SVM, which is better than voting strategy. The experimental results show that the detection performance of the proposed approach outperforms the existing methods in the case of less user profile information.

作者张付志周全强

机构地区燕山大学信息科学与工程学院

出处《模式识别与人工智能》 EI CSCD 北大核心 2011年第4期591-596,共6页 Pattern Recognition and Artificial Intelligence

基金国家973重点基础研究发展计划(No.2005CB321902) 河北省自然科学基金项目(No.F2008000877 F2011203219) 教育部科技发展中心网络时代的科技论文快速共享专项研究课题(No.20091333110011)资助

关键词垃圾书签垃圾检测支持向量机可信度集成学习 Bookmark Spam, Spam Detection, Support Vector Machine, Confidence, EnsembleLearning

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献13

1Koutrika G, Effendi F A, Gyongyi Z, et al. Combating Spare in Tagging Systems// Proc of the 3rd International Workshop on Adversarial Information Retrieval on the Web. Banff, Canada, 2007: 57 - 64.
2Heymann P, Koutrika G, Gareia-Molina H. Fighting Spam on Social Web Sites - A Survey of Approaches and Future Challenges. IEEE Internet Computing, 2007, 11 (6) : 36 - 45.
3Krause B, Schmitz C, Hotho A, et al. The Anti-Social Tagger-Detecting Spam in Social Bookmarking Systems// Proc of the 4th International Workshop on Adversarial Information Retrieval on the Web. Beijing, China, 2008 : 61 - 68.
4Kyriakopoulou A, Kalamboukis T. Combining Clustering with Classification for Spare Detection in Social Bookmarking Systems// Proc of the International Workshop at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Antwerp, Belgium, 2008:47 -54.
5Markines B, Cattuto C, Menczer F. Social Spare Detection// Proc of the 5th International Workshop on Adversarial Information Retrieval on the Web. Madrid, Spain, 2009:41 -48.
6Gkanogiannis A, Kalamboukis T. A Novel Supervised Learning Algorithm and Its Use for Spam Detection// Proc of the International Workshop at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Antwerp, Belgium, 2008 : 13 -20.
7Lin H J, Yeh J P. Optimal Reduction of Solutions for Support Vector Machines. Applied Mathematics and Computation, 2009, 214 (2) : 329 - 335.
8Rokach L. Ensemble-Based Classifiers. Artificial Intelligence Review, 2010, 33(1/2): 1 -39.
9Hanczar B, Nadif M. Using the Bagging Approach for Biclusterlng of Gene Expression Data. Neurocomputing, 2001, 74 (10) : 1595 - 1605.
10Acevedo F J, Maldonado S, Dominguez E, et al. Probabilistic Support Vector Machines for Muhi-Class Alcohol Identification. Sensors and Actuators B: Chemical, 2007, 122 (1) : 227 -235.

1陆海丹,曹春萍,臧劲松.移动垂直搜索引擎在移动医疗中的应用研究[J].计算机应用与软件,2013,30(5):20-21. 被引量：2
2曹芳.将Bootstrap技术应用于CMS建站中[J].信息技术与信息化,2016(1):78-81. 被引量：2
3吴玉峰.未来出口:标准输出成最终目标[J].中国质量万里行,2012(1):70-71.
4林冬茂.数据挖掘技术在垃圾邮件检测中的应用[J].计算机仿真,2012,29(2):120-123. 被引量：6
5李静,黄华.基于IP网络的视频会议系统[J].微计算机信息,2006,22(08Z):29-31.
6李凯,常圣领,高悦.基于聚类技术的集成学习方法研究[J].河北大学学报（自然科学版）,2009,29(2):209-213. 被引量：2
7陈凯,马景义.基于块状bootstrap技术的Bagging Trees集成算法研究[J].统计教育,2008(9):36-40.
8蒋科辉,杨慧民,樊小平.基于通用即插即用的服务监控机器人在智能家居中的应用[J].计算机应用研究,2014,31(12):3637-3641. 被引量：4
9李辉,张标,胡闰智,涂平凡,李思捷,李昂.基于bootstrap技术的村镇养老机构运营管理系统[J].中国农业文摘（农业工程）,2017,29(1):26-29.
10吴伟.C语言实现CGI编程[J].巢湖师专学报,2001,3(3):70-71.

模式识别与人工智能

2011年第4期

浏览历史

内容加载中请稍等...

一种融入可信度的集成SVM垃圾书签检测方法

参考文献13

相关作者

相关机构

相关主题

浏览历史