基于用户兴趣度的垃圾邮件在线识别新方法被引量：4

A Novel Online Spam Identification Method Based on User Interest Degree

下载PDF

导出

摘要多数在线垃圾邮件识别方法未有效区分用户针对不同邮件内容的感兴趣程度,导致垃圾邮件识别精度不高.文中提出了一种基于支持向量机的垃圾邮件在线识别新方法.即结合传统增量学习及主动学习理论,先通过随机选择代表样本寻找分类最不确定的样本进行人工标注;接着引入用户兴趣度的概念,提出了新的样本标注模型和算法性能评价标准;最后结合"轮盘赌"方法将标注后样本加入训练样本集.多种对比实验表明,文中方法针对垃圾邮件识别精度高,样本训练及待标注样本选择速度快,具有较高的在线应用价值. Most online spam identification methods cannot effectively distinguish user interest degree in contents of different emails, thus causing identification precision to be very low .In this paper , a novel online spam identifica-tion method based on the support vector machine （SVM） is proposed.First, according to the theories of incremen-tal learning and active learning , the representative samples are randomly selected from training sets so as to find out samples with most uncertain classification for users to implement labeling .Then , the concept of the user interest degree is introduced , and a new sample labeling model and a new algorithm performance evaluation criterion are proposed .Finally, the“roulette” method is employed to add the labeled samples to the training sets .The results of various comparative experiments show that the proposed method effectively helps achieve high spam identification precision and high speeds of training samples and selecting the samples to be labeled , so its online application is highly valuable .

作者王友卫刘元宁凤丽洲朱晓冬

机构地区吉林大学计算机科学与技术学院

出处《华南理工大学学报（自然科学版）》 EI CAS CSCD 北大核心 2014年第7期21-27,共7页 Journal of South China University of Technology(Natural Science Edition)

基金国家科技成果转化项目(财建[2011]329 财建[2012]258)

关键词垃圾邮件支持向量机增量学习主动学习用户兴趣 spam support vector machines incremental learning active learning user interest

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献16

1Liu W Y, Wang T. Active learning for online spam filte- ring [ J] . Information Retrieval Technology, 2008,4993 : 555-560.
2刘伍颖,王挺.集成学习和主动学习相结合的个性化垃圾邮件过滤[J].计算机工程与科学,2011,33(9):34-41. 被引量：4
3Bouchachia A, Gabrys B, Sahel Z. Overview of some incre- mental learning algorithms [ C] //Proceedings of IEEE In- ternational Conference on Fuzzy Systems. London: IEEE, 2007:1-6.
4Syed N, Liu H, Sung K. Handling concept drifts in incre- mental learning with support vector machines [ C ]//Pro- ceedings of the Workshop on Support Vector Machines at the International Joint Conference on Articial Intelligence (IJCAI-99). Stockholm:IJCAII and the Scandinavian AI Societies, 1999 : 317- 321.
5王学军,赵琳琳,王爽.基于主动学习的视频对象提取方法[J].吉林大学学报（工学版）,2013,43(S1):51-54. 被引量：3
6陈荣,曹永锋,孙洪.基于主动学习和半监督学习的多类图像分类[J].自动化学报,2011,37(8):954-962. 被引量：74
7Wu Y, Kozintsev I, Bouguet J Y, et al. Sampling strategies for active learning in personal photo retrieval [ C l//Pro- ceedings of ICME 2006. Piscataway :IEEE ,2006:529-532.
8吴伟宁,刘扬,郭茂祖,刘晓燕.基于采样策略的主动学习算法研究进展[J].计算机研究与发展,2012,49(6):1162-1173. 被引量：33
9Yang J M, Liu Y N,Zhu X D, et al. A new feature selection based on comprehensive measurement both in inter-catego- ry and intra-category for text categorization [ J ]. Informa- tion Processing & Management,2012,48(4) :741-754.
10Platt John. Sequential minimal optimization: a fast algo- rithm for training support vector machines [ R]. [ S. 1. ]: Microsoft Research, 1998.

二级参考文献136

1曾建潮,崔志华.一种保证全局收敛的PSO算法[J].计算机研究与发展,2004,41(8):1333-1338. 被引量：160
2刘伍颖,王挺.一种多过滤器集成学习垃圾邮件过滤方法[C]//全国信息检索与内容安全学术会议论文集.苏州:[出版者不详],2007.
3Denning P J.Electronic Junk [J].ACM Communications, 1981, 25(3):163-165.
4Lindberg G.Anti-Spam Recommendations for SMTP MTAs [M].Chalmers University of Technology,1999.
5Leiba B, Fenton J.DomainKeys Identified Mail (DKIM): Using Digital Signatures for Domain Verification[C] ∥Proc of the Fourth Conf on Email and Anti-Spam (CEAS 2007),2007.
6Fleizach C, Voelker G M, Savage S.Slicing Spam with Occam’s Razor[C] ∥Proc of the Fourth Conf on Email and Anti-Spam (CEAS 2007), 2007.
7Cohen W W.Fast Effective Rule Induction[C] ∥Proc of the Twelfth Int’l Conf, 1995:115-123.
8Carreras X,Marquez L.Boosting Trees for Anti-Spam Email Filtering[C] ∥Proc of Euro Conf Recent Advances in NLP (RANLP-2001), 2001:58-64.
9Nicholas T.Using AdaBoost and Decision Stumps to Identify Spam E-mail[R].Stanford University Course Project Report, 2003.
10刘洋, 杜孝平, 罗平, 等.“垃圾邮件”的智能分析、过滤及Rough集讨论[C] ∥第12届全国计算机网络与数据通信大会, 2002.

共引文献128

1刘振宇,李钦富,杨硕,邓应强,刘芬,赖新明,白雪珂.一种基于主动学习和多种监督学习的情感分析模型[J].中国电子科学研究院学报,2020,15(2):171-176. 被引量：2
2文辉,徐永林,于敬.基于主动学习的领域知识多模式抽取框架[J].新一代信息技术,2022,5(6):137-143.
3曹健,陈红倩,毛典辉,李海生,蔡强.基于局部特征的图像目标识别问题综述[J].中南大学学报（自然科学版）,2013,44(S2):258-262. 被引量：14
4郝武伟,曾建潮.基于聚类分析的随机微粒群算法[J].计算机工程与应用,2010,46(8):40-44. 被引量：5
5李鹏,全惠云.改进的混合粒子群算法[J].计算机工程与应用,2010,46(11):29-31. 被引量：3
6王建丽,夏桂梅,王希云.一种基于协同进化的随机微粒群算法[J].太原科技大学学报,2010,31(3):185-188. 被引量：1
7赵丰丰.美国《化学文摘》近年的变化[J].医学情报工作,2000,21(2):45-47. 被引量：3
8何英明,王瑞和,雷杨,臧艳彬,何英君.基于遗传算法的套管柱优化设计方法[J].石油机械,2012,40(6):26-29. 被引量：1
9黄国兴,吴新杰.利用粒子滤波原理求解函数优化问题[J].辽宁大学学报（自然科学版）,2012,39(2):136-140.
10吴新杰,黄国兴.利用粒子滤波求解旅行商问题[J].计算机应用,2012,32(8):2219-2222. 被引量：4

同被引文献23

1Luckner M,Gad M,Sobkowiak P.Stable Web spam detection using features based on lexical items[J].Computers Securi-ty,2014,46:79-93.
2Bouchachia A,Gabrys B,Sahel Z.Overview of some incremental learning algorithms[C]//IEEE International Conference on Fuzzy Systems.2007:1-6.
3Liu W Y,Wang T.Active learning for online spam filtering[M]//Information Retrieval Technology:4th Asia Information Retrieval Symposium,AIRS 2008.2008:555-560.
4Syed N,Liu H,Sung K.Incremental learning with support vector machines[C]// Proceedings of the Workshop on Support Vector Machines at the International Joint Conference on Artificial Intelligence(IJCAI-99).Stockholm,Sweden,1999.
5Amayri O,Bouguila N.A study of spam filtering using support vector machines[J].Artificial Intelligence Review,2010,34(1):73-108.
6Joshi A J,Porikli F,Papanikolopoulos N.Multi-class active learning for image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Miami,USA:IEEE,2009;2372-2379.
7Wu Y,Kozintsev I,Bouguet J Y,et al.Sampling strategies for active learning in personal photo retrieval[C]//Proceedings of ICME 2006.Piscataway,NJ:IEEE,2006:529-532.
8Yang J M,Liu Y N,Zhu X D,et al.A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization[J].Information Processing & Management,2012,48(4):741-754.
9Mccallum AtNigam K.A comparison of event models for naive.Bayes text classification[C]//Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics(EACL,03).2007,1:307-314.
10陈荣,曹永锋,孙洪.基于主动学习和半监督学习的多类图像分类[J].自动化学报,2011,37(8):954-962. 被引量：74

引证文献4

1徐丹丹,陈松灿.基于客户端的个性化邮件再过滤系统[J].中国科学：信息科学,2018,48(12):1681-1696. 被引量：1
2王友卫,朱建明,李洋,凤丽洲.基于增量学习和主动学习的垃圾邮件识别新方法[J].计算机科学,2015,42(B10):23-27.
3徐勇.基于聚类算法的内容识别研究[J].电脑与电信,2016(11):39-41. 被引量：1
4陈斌,东一舟,毛明荣.基于分类邮件代理MCP的垃圾邮件动态检测[J].南京师范大学学报（工程技术版）,2017,17(3):80-86.

二级引证文献2

1陈双全.基于聚类算法的视频内容识别研究[J].电脑与电信,2017(11):44-46.
2朱翌民,郭茹燕,巨家骥,张帅,张维.一种结合Focal Loss的不平衡数据集提升树分类算法[J].软件导刊,2021,20(11):65-69. 被引量：4

1梁晟.一种基于支持向量机的垃圾邮件识别方法[J].毕节学院学报（综合版）,2010,28(4):108-111.
2华北,曹先彬.基于代表样本动态生成的中文网页分类[J].计算机应用,2006,26(10):2502-2504. 被引量：2
3华北,曹先彬.基于代表样本动态生成的快速文本分类[J].计算机仿真,2007,24(6):322-325.
4王正群,侯艳平,邹军,马波.改进的特征选择算法[J].计算机工程与设计,2008,29(22):5814-5816. 被引量：2
5陈传波,赵伟伟.一种自主工作流任务分配策略[J].华中科技大学学报（自然科学版）,2005,33(6):20-22. 被引量：7
6李村合,冯静.一种改进的KNN网页分类算法[J].微计算机应用,2008,29(3):21-25. 被引量：3
7陈建军,高玉斌.引入影响度的关联规则衡量标准[J].计算机工程与应用,2009,45(8):141-142. 被引量：7
8张红莉,黄守明.基于MapReduce的网络信息提取方法[J].安徽科技学院学报,2013,27(2):72-75. 被引量：3
9李新洁,张新有.垃圾邮件行为识别技术研究[J].计算机技术与发展,2011,21(10):19-22. 被引量：1
10王友卫,刘元宁,凤丽洲,朱晓冬.基于用户兴趣集的在线垃圾邮件快速识别新方法[J].电子学报,2015,43(10):1963-1970. 被引量：2

华南理工大学学报（自然科学版）

2014年第7期

浏览历史

内容加载中请稍等...

基于用户兴趣度的垃圾邮件在线识别新方法被引量：4

参考文献16

二级参考文献136

共引文献128

同被引文献23

引证文献4

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

基于用户兴趣度的垃圾邮件在线识别新方法 被引量：4

参考文献16

二级参考文献136

共引文献128

同被引文献23

引证文献4

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

基于用户兴趣度的垃圾邮件在线识别新方法被引量：4