期刊文献+

基于改进堆叠自动编码机的垃圾邮件分类 被引量:7

Spam filtering based on modified stack auto-encoder
下载PDF
导出
摘要 针对堆叠自动编码机(SA)容易产生过拟合而降低垃圾邮件分类精度的问题,提出了一种基于动态dropout的改进堆叠自动编码机方法。首先分析了垃圾邮件分类问题的特殊性,将dropout算法引入到堆叠自动编码机算法中;同时,根据传统dropout算法容易使部分节点长期处于熄火状态的缺陷,提出了一种动态dropout改进算法,使用动态函数将传统静态熄火率修改为随着迭代次数逐渐减小的动态熄火率;最后,利用动态dropout算法改进堆叠自动编码机的预训练模型。仿真结果表明,相比支持向量机(SVM)和反向传播(BP)神经网络,改进的堆叠自动编码机平均准确率达到了97.66%,各个数据集上马修斯系数都大于89%;与传统堆叠自动编码机相比,改进的堆叠自动编码机的马修斯系数在Error1-6数据集上分别提高了3.27%、1.68%、2.16%、1.51%、1.58%、1.07%。实验结果表明,基于动态dropout算法的改进堆叠自动编码机具有更高的分类精度和更好的稳定性。 Concerning the problem that Stack Auto-encoder( SA) easily traps to overfitting, which may reduce the accuracy of spam classification, a modified SA method based on dynamic dropout was proposed. Firstly, the specificity of the spam classification was analyzed, and the dropout algorithm was employed in SA to handle overfitting. Then according to the fault of dropout algorithm that making some nodes be in the stall state for a long time, an improved algorithm of dropout was proposed. The static dropout rate was replaced by dynamic dropout rate which decreased with training steps using dynamic function. Finally, the dynamic dropout algorithm was used to improve the pretraining model of SA. The simulation results show that compared with Support Vector Machine( SVM) and Back Propagation( BP) neural network, the average accuracy of the modified SA is 97. 66%. And the Matthews correlation coefficient of every dataset is higher than 89%. Matthews correlation coefficient of the modified SA on every dataset is 3. 27%, 1. 68%, 2. 16%, 1. 51%, 1. 58% and 1. 07% higher than that of the conventional SA separately. The experimental results show that the modified SA using dynamic dropout has higher accuracy and better robustness.
出处 《计算机应用》 CSCD 北大核心 2016年第1期158-162,193,共6页 journal of Computer Applications
基金 国家科技重大专项(2015ZX01040101-002) 国家自然科学基金资助项目(91338107)~~
关键词 深度学习 堆叠自动编码机 DROPOUT 支持向量机 垃圾邮件 分类 deep learning Stack Auto-encoder(SA) dropout Support Vector Machine(SVM) spam classification
  • 相关文献

参考文献16

  • 1CORMACK G V. Email spam filtering: a systematic review [J]. Foundations and trends in information retrieval, 2007, 1(4): 335-455.
  • 2ALMEIDA T A, YAMAKAMI A. Advances in spam filtering techniques [M]// Computational Intelligence for Privacy and Security. Berlin: Springer, 2012: 199-214.
  • 3CHOUHAN S. Behavior analysis of SVM based spam filtering using various kernel functions and data representations [J]. International journal of engineering research and technology, 2013, 2(9): 3029-3036.
  • 4张艳秋,王蔚.利用遗传算法优化的支持向量机垃圾邮件分类[J].计算机应用,2009,29(10):2755-2757. 被引量:20
  • 5PUNISKIS D, LAURUTIS R, DIRMEIKIS R. An artificial neural nets for spam E-mail recognition [J]. Electronics and electrical engineering, 2006, 69(5): 73-76.
  • 6郭守团,徐志根.基于BP神经网络的垃圾邮件过滤器研究[J].计算机安全,2009(12):19-20. 被引量:3
  • 7BENGIO Y. Learning deep architectures for AI [J]. Foundations and trends in machine learning, 2009, 2(1): 1-127.
  • 8BENGIO Y, COURVILLE A, VINCENT P. Representation learning: a review and new perspectives [J]. Pattern analysis and machine intelligence, 2013, 35(8): 1798-1828.
  • 9孙劲光,蒋金叶,孟祥福,李秀娟.深度置信网络在垃圾邮件过滤中的应用[J].计算机应用,2014,34(4):1122-1125. 被引量:14
  • 10TZORTZIS G, LIKAS A. Deep belief networks for spam filtering [C]// ICTAI 2007: Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence. Piscataway, NJ: IEEE, 2007: 306-309.

二级参考文献27

  • 1王波,黄迪明.遗传神经网络在邮件过滤器中的应用[J].电子科技大学学报,2005,34(4):505-508. 被引量:9
  • 2樊兴华,孙茂松.一种高性能的两类中文文本分类方法[J].计算机学报,2006,29(1):124-131. 被引量:70
  • 3李钢,王蔚,张胜.支持向量机在脑电信号分类中的应用[J].计算机应用,2006,26(6):1431-1433. 被引量:19
  • 4CERVANTES J, LI XIAO-OU, YU WEN. SVM classification for large data sets by considering models of classes distribution[ C]// Proceedings of the 2007 Sixth Mexican International Conference on Artificial Intelligence, Special Session. Washington, DC: IEEE Computer Society, 2007:51 - 60.
  • 5NHUNG N P, PHUONG T M. An efficient method for filtering image-based spam[ C]//Proceedings of the 2007 IEEE International Conference on Research, Innovation and Vision for the Future. [ S. l. ] : IEEE Press, 2007:96 - 102.
  • 6KIM D S, NGUYEN H-N. Genetic algorithm to improve SVM based network intrusion detection system[ C] // Proceedings of the 19th International Conference on Advanced Information Networking and Applications. Washington, DC: IEEE Computer Society, 2005:155 - 158.
  • 7DRUCKER H, WU DONG-HUI, VAONICK V N. Support vector machines for spam categorization [ J]. IEEE Transactions on Neural Networks, 1999, 10(5): 1048 -54.
  • 8VAPNIK V N. An overview of statistical learning theory [ J]. IEEE Transactions on Neural Network, 1999, 10(5) : 988 - 999.
  • 9刘伍颖,王挺.一种多过滤器集成学习垃圾邮件过滤方法[C]//全国信息检索与内容安全学术会议论文集.苏州:[出版者不详],2007.
  • 10王清翔,广凯,潘金贵.基于支持向量机的邮件过滤[J].计算机科学,2007,34(9):93-94. 被引量:5

共引文献34

同被引文献49

引证文献7

二级引证文献41

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部