摘要
针对堆叠自动编码机(SA)容易产生过拟合而降低垃圾邮件分类精度的问题,提出了一种基于动态dropout的改进堆叠自动编码机方法。首先分析了垃圾邮件分类问题的特殊性,将dropout算法引入到堆叠自动编码机算法中;同时,根据传统dropout算法容易使部分节点长期处于熄火状态的缺陷,提出了一种动态dropout改进算法,使用动态函数将传统静态熄火率修改为随着迭代次数逐渐减小的动态熄火率;最后,利用动态dropout算法改进堆叠自动编码机的预训练模型。仿真结果表明,相比支持向量机(SVM)和反向传播(BP)神经网络,改进的堆叠自动编码机平均准确率达到了97.66%,各个数据集上马修斯系数都大于89%;与传统堆叠自动编码机相比,改进的堆叠自动编码机的马修斯系数在Error1-6数据集上分别提高了3.27%、1.68%、2.16%、1.51%、1.58%、1.07%。实验结果表明,基于动态dropout算法的改进堆叠自动编码机具有更高的分类精度和更好的稳定性。
Concerning the problem that Stack Auto-encoder( SA) easily traps to overfitting, which may reduce the accuracy of spam classification, a modified SA method based on dynamic dropout was proposed. Firstly, the specificity of the spam classification was analyzed, and the dropout algorithm was employed in SA to handle overfitting. Then according to the fault of dropout algorithm that making some nodes be in the stall state for a long time, an improved algorithm of dropout was proposed. The static dropout rate was replaced by dynamic dropout rate which decreased with training steps using dynamic function. Finally, the dynamic dropout algorithm was used to improve the pretraining model of SA. The simulation results show that compared with Support Vector Machine( SVM) and Back Propagation( BP) neural network, the average accuracy of the modified SA is 97. 66%. And the Matthews correlation coefficient of every dataset is higher than 89%. Matthews correlation coefficient of the modified SA on every dataset is 3. 27%, 1. 68%, 2. 16%, 1. 51%, 1. 58% and 1. 07% higher than that of the conventional SA separately. The experimental results show that the modified SA using dynamic dropout has higher accuracy and better robustness.
出处
《计算机应用》
CSCD
北大核心
2016年第1期158-162,193,共6页
journal of Computer Applications
基金
国家科技重大专项(2015ZX01040101-002)
国家自然科学基金资助项目(91338107)~~