期刊文献+

基于用户兴趣度的垃圾邮件在线识别新方法 被引量:4

A Novel Online Spam Identification Method Based on User Interest Degree
下载PDF
导出
摘要 多数在线垃圾邮件识别方法未有效区分用户针对不同邮件内容的感兴趣程度,导致垃圾邮件识别精度不高.文中提出了一种基于支持向量机的垃圾邮件在线识别新方法.即结合传统增量学习及主动学习理论,先通过随机选择代表样本寻找分类最不确定的样本进行人工标注;接着引入用户兴趣度的概念,提出了新的样本标注模型和算法性能评价标准;最后结合"轮盘赌"方法将标注后样本加入训练样本集.多种对比实验表明,文中方法针对垃圾邮件识别精度高,样本训练及待标注样本选择速度快,具有较高的在线应用价值. Most online spam identification methods cannot effectively distinguish user interest degree in contents of different emails, thus causing identification precision to be very low .In this paper , a novel online spam identifica-tion method based on the support vector machine (SVM) is proposed.First, according to the theories of incremen-tal learning and active learning , the representative samples are randomly selected from training sets so as to find out samples with most uncertain classification for users to implement labeling .Then , the concept of the user interest degree is introduced , and a new sample labeling model and a new algorithm performance evaluation criterion are proposed .Finally, the“roulette” method is employed to add the labeled samples to the training sets .The results of various comparative experiments show that the proposed method effectively helps achieve high spam identification precision and high speeds of training samples and selecting the samples to be labeled , so its online application is highly valuable .
出处 《华南理工大学学报(自然科学版)》 EI CAS CSCD 北大核心 2014年第7期21-27,共7页 Journal of South China University of Technology(Natural Science Edition)
基金 国家科技成果转化项目(财建[2011]329 财建[2012]258)
关键词 垃圾邮件 支持向量机 增量学习 主动学习 用户兴趣 spam support vector machines incremental learning active learning user interest
  • 相关文献

参考文献16

  • 1Liu W Y, Wang T. Active learning for online spam filte- ring [ J] . Information Retrieval Technology, 2008,4993 : 555-560.
  • 2刘伍颖,王挺.集成学习和主动学习相结合的个性化垃圾邮件过滤[J].计算机工程与科学,2011,33(9):34-41. 被引量:4
  • 3Bouchachia A, Gabrys B, Sahel Z. Overview of some incre- mental learning algorithms [ C] //Proceedings of IEEE In- ternational Conference on Fuzzy Systems. London: IEEE, 2007:1-6.
  • 4Syed N, Liu H, Sung K. Handling concept drifts in incre- mental learning with support vector machines [ C ]//Pro- ceedings of the Workshop on Support Vector Machines at the International Joint Conference on Articial Intelligence (IJCAI-99). Stockholm:IJCAII and the Scandinavian AI Societies, 1999 : 317- 321.
  • 5王学军,赵琳琳,王爽.基于主动学习的视频对象提取方法[J].吉林大学学报(工学版),2013,43(S1):51-54. 被引量:3
  • 6陈荣,曹永锋,孙洪.基于主动学习和半监督学习的多类图像分类[J].自动化学报,2011,37(8):954-962. 被引量:74
  • 7Wu Y, Kozintsev I, Bouguet J Y, et al. Sampling strategies for active learning in personal photo retrieval [ C l//Pro- ceedings of ICME 2006. Piscataway :IEEE ,2006:529-532.
  • 8吴伟宁,刘扬,郭茂祖,刘晓燕.基于采样策略的主动学习算法研究进展[J].计算机研究与发展,2012,49(6):1162-1173. 被引量:33
  • 9Yang J M, Liu Y N,Zhu X D, et al. A new feature selection based on comprehensive measurement both in inter-catego- ry and intra-category for text categorization [ J ]. Informa- tion Processing & Management,2012,48(4) :741-754.
  • 10Platt John. Sequential minimal optimization: a fast algo- rithm for training support vector machines [ R]. [ S. 1. ]: Microsoft Research, 1998.

二级参考文献136

  • 1曾建潮,崔志华.一种保证全局收敛的PSO算法[J].计算机研究与发展,2004,41(8):1333-1338. 被引量:160
  • 2刘伍颖,王挺.一种多过滤器集成学习垃圾邮件过滤方法[C]//全国信息检索与内容安全学术会议论文集.苏州:[出版者不详],2007.
  • 3Denning P J.Electronic Junk [J].ACM Communications, 1981, 25(3):163-165.
  • 4Lindberg G.Anti-Spam Recommendations for SMTP MTAs [M].Chalmers University of Technology,1999.
  • 5Leiba B, Fenton J.DomainKeys Identified Mail (DKIM): Using Digital Signatures for Domain Verification[C] ∥Proc of the Fourth Conf on Email and Anti-Spam (CEAS 2007),2007.
  • 6Fleizach C, Voelker G M, Savage S.Slicing Spam with Occam’s Razor[C] ∥Proc of the Fourth Conf on Email and Anti-Spam (CEAS 2007), 2007.
  • 7Cohen W W.Fast Effective Rule Induction[C] ∥Proc of the Twelfth Int’l Conf, 1995:115-123.
  • 8Carreras X,Marquez L.Boosting Trees for Anti-Spam Email Filtering[C] ∥Proc of Euro Conf Recent Advances in NLP (RANLP-2001), 2001:58-64.
  • 9Nicholas T.Using AdaBoost and Decision Stumps to Identify Spam E-mail[R].Stanford University Course Project Report, 2003.
  • 10刘洋, 杜孝平, 罗平, 等.“垃圾邮件”的智能分析、过滤及Rough集讨论[C] ∥第12届全国计算机网络与数据通信大会, 2002.

共引文献128

同被引文献23

  • 1Luckner M,Gad M,Sobkowiak P.Stable Web spam detection using features based on lexical items[J].Computers Securi-ty,2014,46:79-93.
  • 2Bouchachia A,Gabrys B,Sahel Z.Overview of some incremental learning algorithms[C]//IEEE International Conference on Fuzzy Systems.2007:1-6.
  • 3Liu W Y,Wang T.Active learning for online spam filtering[M]//Information Retrieval Technology:4th Asia Information Retrieval Symposium,AIRS 2008.2008:555-560.
  • 4Syed N,Liu H,Sung K.Incremental learning with support vector machines[C]// Proceedings of the Workshop on Support Vector Machines at the International Joint Conference on Artificial Intelligence(IJCAI-99).Stockholm,Sweden,1999.
  • 5Amayri O,Bouguila N.A study of spam filtering using support vector machines[J].Artificial Intelligence Review,2010,34(1):73-108.
  • 6Joshi A J,Porikli F,Papanikolopoulos N.Multi-class active learning for image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Miami,USA:IEEE,2009;2372-2379.
  • 7Wu Y,Kozintsev I,Bouguet J Y,et al.Sampling strategies for active learning in personal photo retrieval[C]//Proceedings of ICME 2006.Piscataway,NJ:IEEE,2006:529-532.
  • 8Yang J M,Liu Y N,Zhu X D,et al.A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization[J].Information Processing & Management,2012,48(4):741-754.
  • 9Mccallum AtNigam K.A comparison of event models for naive.Bayes text classification[C]//Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics(EACL,03).2007,1:307-314.
  • 10陈荣,曹永锋,孙洪.基于主动学习和半监督学习的多类图像分类[J].自动化学报,2011,37(8):954-962. 被引量:74

引证文献4

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部