摘要
大多数文本为高维且线性不可分。针对中文邮件,首先阐述了邮件预处理的相关方法,利用TF-TDF将邮件向量化。分析了多种常用核函数在SVM中应用于垃圾邮件过滤。阐述了全局核函数和局部核函数的特点,主要针对全局核函数-多项式(Poly)核函数和局部核函数-径向基核(RBF)函数在垃圾邮件分类的准确性做了比较,综合分析后组合两种核函数。实验证明,组合核函数在性能上优于单个核函数,具有较好的学习能力和泛化能力。
Most of the text for high-dimensional and linear inseparable.On account of Chinese email, first express the pretreatment method of email, use TF-TDF to make the email to be vectorization.Then analyze several kernel function in SVM is applied to spam filtering. This paper explains global kernel function-Polynomial kernel function and local kernel function- RBF kernel function and compares the accuracy of spam filtering.After comprehensive analysis of these two kinds Kernels,then combination of them.As the experimental results and analysis shows that the combination of kernel functions on the performance is better than that of single kernel function.That combination is equipped with good learning ability and generalization ability.
出处
《电子设计工程》
2015年第11期51-53,共3页
Electronic Design Engineering
关键词
全局核函数
局部核函数
组合核函数
支持向量机
global kernel function
local kernel function
hybrid kernel function
support vector machine