摘要
提出了一种利用支持向量机改进的朴素贝叶斯算法——TSVM-NB算法。首先利用NB算法对样本集进行初次训练,利用支持向量机构造一个最优分类超平面,每个样本根据与其距离最近样本的类型是否相同进行取舍,这样既降低样本空间规模,又提高每个样本类别的独立性,最后再次用朴素贝叶斯算法训练样本集从而生成分类模型。仿真实验结果表明,该算法在样本空间进行取舍过程当中消除了冗余属性,可以快速得到分类特征子集,提高了垃圾邮件过滤的分类速度、召回率和正确率。
A method of improved support vector machine naive Bayes algorithm was proposed——TSVM-NB algorithm. First using NB algorithm to initial sample set, constructing an optimal classification by SVM, each sample according to its distance from the sample was the same type of recent choice, so as to reduce the size of the sample space, but also improve the independence of each sample the last category, again with naive Bayes algorithm training set to generate the classification model. Simulation results show that the algorithm selection process to eliminate the redundant attributes in the sample space, the classification feature subset can be got quickly and improve spam filtering classification speed, recall rate and accuracy of the same algorithm.
作者
杨雷
曹翠玲
孙建国
张立国
YANG Lei CAO Cui-ling SUN Jian-guo ZHANG Li-guo(College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China)
出处
《通信学报》
EI
CSCD
北大核心
2017年第4期140-148,共9页
Journal on Communications
基金
国家自然科学基金资助项目(No.61202455
No.61472096)~~