摘要
针对当前基于人工免疫技术的垃圾邮件过滤算法中很少考虑干扰词攻击的情况,提出了一种抗干扰词攻击的免疫垃圾邮件过滤模型训练算法ANWAIS。该算法在基因库生成阶段,采用互信息差值作为评估函数,可以过滤掉垃圾邮件中的好词和正常邮件中的垃圾词,从而使基因库更能反映垃圾邮件的特征;同时,在抗体更新阶段,通过维护丢弃词表,可保证基因库的纯洁性。仿真实验表明,该算法能够比未考虑干扰词攻击的垃圾邮件过滤算法获得更好的抗体质量和更优的分类性能。
Current spam filtering algorithms based on artificial immune system consider little about the noise word attack,so an immune-based anti-noise word attack spam filtering model,named AN-WAIS,is proposed in order to solve the problem.The algorithm uses the Mutual Information Difference as the Evaluation function to discard the good word in the spam and the spam word in the normal email during the stage of the generation of the gene library,so that the gene library can better reflect the characteristics of spam emails.Meanwhile,it can guarantee the purity of the gene library through maintaining the discard word table during the stage of the updating of the antibody.Experimental results show that ANWAIS can obtain higher quality antibody and have better classification performance than that of other spam filtering algorithms without considering the noise word attack.
出处
《计算机工程与科学》
CSCD
北大核心
2013年第12期173-177,共5页
Computer Engineering & Science
基金
河南省教育厅科学技术研究重点项目(12B520056
13B520253)
郑州大学体育学院青年基金项目(2011C3003)
关键词
人工免疫
干扰词攻击
垃圾邮件过滤
互信息差值
基因库
artificial immune
noise word attack
spam filter
mutual information difference
gene library