期刊文献+

基于用户兴趣集的在线垃圾邮件快速识别新方法 被引量:2

A Novel Quick Online Spam Identification Method Based on User Interest Set
下载PDF
导出
摘要 为在不显著降低垃圾邮件识别精度的同时有效提高邮件识别速度,提出了一种在线垃圾邮件快速识别新方法.首先引入用户正、负兴趣集的概念,结合用户兴趣集及支持向量机对邮件进行分类;然后根据主动学习理论,结合训练集样本密度及改进角度差异方法寻找分类最不确定的样本并推荐给用户进行类别标注;最后将标注后样本及分类最确定性样本加入训练集,并使用样本价值评价新函数淘汰冗余样本以生成新的训练集.实验表明,本文方法的用户标注负担小,垃圾邮件识别精度高、速度快,具有较高的在线应用价值. In order to improve the spam identification speed without sacrificing the accuracy seriously,a novel quick online spam identication method is proposed.Firstly,the conceptions of user positive interest set and user negative interest set are intro-duced,and emails are classified by combining user interest sets and support vector machine.Secondly,based on the active learning theory,the sample densities of different categories and the improved angle diversity method are used to select the most uncertainly classified samples,and the selected samples are recommended to users for labeling.Finally,the labeled and the classified samples with greatest possiblities are put into the training set,and a novel sample value evaluating function is proposed to filter the redundant samples for generating a new training set.Experimental results show that,the sample labeling burden of the proposed method is small,the spam identification accuracy is high,and the spam identification speed is fast,the high value of the proposed method on online application is proved.
出处 《电子学报》 EI CAS CSCD 北大核心 2015年第10期1963-1970,共8页 Acta Electronica Sinica
基金 国家科技成果转化项目(财建[2011]329 财建[2012]258)
关键词 垃圾邮件 用户兴趣集 支持向量机 主动学习 在线应用 spam user interest set support vector machine active learning online application
  • 相关文献

参考文献15

  • 1Liu W Y,Wang T.Online active multi-field learning for efficient email spam filtering[J].Knowledge and Information Systems,2012,33(1):117-136.
  • 2Bertini J R,Zhao L,Lopes A A.An incremental learning algorithm based on the K-associated graph for non-stationary data classification[J].Information Sciences,2013,246:52-68.
  • 3Costa J,Silva C,Antunes M,Ribeiro B.Customized crowds and active learning to improve classification[J].Expert System with Applications,2013,40(18):7212-7219.
  • 4Syed N A,Liu H,Huan S,et al.Handling concept drifts in incremental learning with support vector machines[A].Proceedings of the Workshop on Support Vector Machines at the International Joint Conference on Artificial Intelligence[C].Stockholm,Sweden,1999.317-321.
  • 5Wu C M,Wang X D,Bai D Y,et al.Fast incremental learning algorithm of SVM on KKT conditions[A].Sixth International Conference on Fuzzy Systems and Knowledge Discovery[C].Tianjin,China:IEEE Press,2009.551-554.
  • 6Amayri O,Bouguila N.A study of spam filtering using support vector machines[J].Artificial Intelligence Review,2010,34(1):73-108.
  • 7Tong S,Chang E.Support vector machine active learning for image retrieval[A].Proceedings of the 9th ACM International Conference on Multimedia[C].New York,USA:ACM,2001.107-118.
  • 8Hu L S,Lu S X,Wang X Z.A new and informative active learning approach for support vector machine[J].Information Sciences,2013,244:142-160.
  • 9Leng Y,Xu X Y,Qi G H.Combining active learning and semi-supervised learning to construct SVM classifier[J].Knowledge-Based Systems,2013,44(5):121-131.
  • 10陈荣,曹永锋,孙洪.基于主动学习和半监督学习的多类图像分类[J].自动化学报,2011,37(8):954-962. 被引量:74

二级参考文献19

  • 1Settles B. Active Learning Literature Survey, Computer Science Technical Report 1648, University of Wisconsin- Madison, USA, 2009. 3-4.
  • 2Dasgupta S. Coarse sample complexity bounds for active learning. Advances in Neural Information Processing Sys- tems. Cambridge: The MIT Press, 2006. 235-242.
  • 3Tong S, Chang E. Support vector machine active learning for image retrieval. In: Proceedings of the 9th ACM Inter- national Conference on Multimedia. New York, USA: ACM, 2001. 107-118.
  • 4Tong S, Koller D. Support vector machine active learning with applications to text classification. The Journal of Ma- chine Learning Research, 2002, 2:45-66.
  • 5Seung H S, Opper M, Sompolinsky H. Query by commit- tee. In: Proceedings of the 5th Annual Workshop on Com- putational Learning Theory. New York, USA: ACM, 1992. 287-294.
  • 6Dagan I, Engelson S P. Committee-based sampling for train- ing probabilistic classifiers. In: Proceedings of the 12th International Conference on Machine Learning. California, USA: Morgan Kaufmann, 1995. 150-157.
  • 7Hoi S C H, Jin R, Lyu M R. Batch mode active learning with applications to text categorization and image retrieval. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1233-1248.
  • 8Joshi A J, Porikli F, Papanikolopoulos N. Multi-class ac- tive learning for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recog- nition. Miami, USA: IEEE, 2009. 2372-2379.
  • 9Zhu X J. Semi-supervised Learning Literature Survey, Computer Sciences Technical Report 1530, University of Wisconsin-Madison. USA. 2008. 11-13.
  • 10Riloff E, Wiebe J, Wilson T. Learning subjective nouns using extraction pattern bootstrapping. In: Proceedings of the 7th Conference on Natural Language Learning. Stroudsburg, USA: Association for Computational Linguis- tics, 2003. 25-32.

共引文献78

同被引文献5

引证文献2

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部