期刊文献+

基于多特征的垃圾微博检测方法 被引量:3

Detection Method of Spam Based on Multi-Features of Micro-Blog
下载PDF
导出
摘要 随着微博平台的快速发展,垃圾信息检测与过滤也面临着巨大的考验,实时精确地识别垃圾信息对于提高用户的体验以及微博平台的可持续发展意义重大.本文根据新浪微博的真实数据,提出了一种基于多特征的垃圾微博检测方法.首先,提取微博的显式特征(用户特征、内容特征);然后利用文档主题生成模型(LDA)提取微博中的隐含主题特征;最后根据所提取的微博特征利用支持向量机(SVM)构建分类器.实验结果表明,该方法相比于现有方法在准确率和F1值方面都有一定的提升. With the rapid development of micro-blog, spam detection and filtering is faced with enormous challenges. It is significant to realize realtime and accurate detection of spam, which is important to improve user experience and the sustainable development of micro-blog platform. In this paper, a spam detection method based on multi-features of microblog is proposed. The main procedures are: first, the features of user and content are extracted. Second, LDA is applied to extract latent topic features. Finally, the features above are fused and a proper classifier is trained based on SVM.Experimental results show that the precision and F1 get increased while adopting the method proposed in this paper compared to the pervious methods.
作者 邹永潘 李伟 王儒敬 ZOU Yong-Pan LI Wei WANG Ru-Jing(Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China University of Science and Technology of China, Hefei 230026, China)
出处 《计算机系统应用》 2017年第10期184-189,共6页 Computer Systems & Applications
基金 中国科学院战略性先导科技专项(XDA08040110)
关键词 垃圾微博检测 隐含狄利克雷分布 支持向量机 spam detection latent Dirichlet allocation support vector machine
  • 相关文献

参考文献5

二级参考文献86

  • 1王元珍,钱铁云,冯小年.基于关联规则挖掘的中文文本自动分类[J].小型微型计算机系统,2005,26(8):1380-1383. 被引量:13
  • 2樊兴华,孙茂松.一种高性能的两类中文文本分类方法[J].计算机学报,2006,29(1):124-131. 被引量:70
  • 3张华平.计算所汉语词法分析系统ICTCLAS[EB/OL].[2002-08-16].http://www.nip.org.cn/project/project.php?pwj_id=6.
  • 4D. Blei and J. Lafferty, Correlated topic models [C]//Advances in Neural Information Processing Gystems 18, MIT Press, Cambridge, MA. 2006.
  • 5Qiaozhu Mei, Xu Ling,Matthew Wondra, Hang Su, ChengXiang Zhai, Topic Sentiment Mixture: Model ing Facets and Opinions in Web logs[C]//Proceedings of the 16th international conference on World Wide Web (WWW 2007), Banff, Alberta, Canada: 171-180.
  • 6Yue Lu, Chengxiang Zhai. Opinion Integration Through Semi-supervised Topic Modeling[C]//Proceedings of the 17th International Conference on World Wide Web (WWW 2008) ,Beijing, China: 121- 130.
  • 7Xing Wei, W. B. Croft, LDA-based Document Models for Ad hoc Retrieval[C]//Proceedings of the 29^th SIGIR Conference, Seattle, Washington, USA, 2006: 178-185.
  • 8B. Liu. Web Data Mining: Exploring Hyperlinks, Contents and Usage Data [M]. Springer, 2007.
  • 9Vapnik V. , The Nature of Statistical Learning Theory [M]. New York: Springer,1995.
  • 10中科院分词系统:http://ictclas.org[DB/OL].

共引文献154

同被引文献32

引证文献3

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部