摘要
在电子邮件分类的研究中,针对研究垃圾邮件识别问题,垃圾邮件问题日益严重,影响正常工作,受到研究人员的广泛关注。而电子邮件特征维数相当的高,使传统分类方法存在分类速度慢、正确率低的问题。为了加快电子邮件分类速度、提高分类的正确率,更好的过滤出垃圾邮件,提出一种基于支持向量机的电子邮件自动分类方法。采用互信息量法提取电子邮件关键词作为分类特征,选择最优的分类特征,加快分类速度,然后支持向量机模型对分类特征进行学习训练,建立最优电子邮件分类器模型,最后对电子邮件测试集进行分类。UCI垃圾邮件数据库进行仿真,支持向量机识别正确率远远高于神经网络,且分类速度明显加快,能够很好的把垃圾分类出来。支持向量机分类方法是一种有效的电子邮件分类方法,有利于清除拉圾邮件。
The volume of junk email in Internet has grown tremendously in the past few years,and this problem attracts many researchers' attention.Due to the diversity of music and high dimension,traditional classification methods in practical application of large Email classification are slow and of lower accuracy.In order to improve the accuracy of classification,an email classification method is proposed based on support vector machine.Email classification task consists of feature extraction and classification.Mutual information method is used to extract key feature of email while support vector machine is designed for classifying.Simulation experiments of nine class emails show that support vector machine's average classification correction is 89.9%.Compared with BPNN method,the classification performances are improved by 4%.Experimental results indicate that support vector machine is useful method for email classification.
出处
《计算机仿真》
CSCD
北大核心
2011年第8期156-158,195,共4页
Computer Simulation
关键词
电子邮件
支持向量机
分类
特征提取
Email
Support vector machine(SVM)
Classification
Feature extraction