摘要
文本分类的分类算法常用J48算法、Naive Bayes Multinomia算法和SMO算法,利用Weka平台选择路透社的数据集进行分类实验,根据查准率、查全率和F-Measure综合指标结合其他文本分类评价指标分析六次实验得到的结果,得出SMO算法优于其他两个算法。针对选择的Naive Bayes Multinomia算法,调整了numToSelect值,对其结果进行了优化。以此实验为文本分类研究工作提供参考。
On the basis of introducing the commonly used J48 algorithm,Naive Bayes Multinomia algorithm and SMO algorithm to the classification algorithm selection of text categorization,we use Weka platform to select data sets for classification experiments.According to the precision,recall and index combined with other text classification evaluation indexes,we analyze the results obtained from the six experiments,and conclude that SMO algorithm is better than the other two algorithms.For the selected Naive Bayes Multinomia algorithm,the numToSelect value is adjusted and its results optimized.This experiment provides some references for the research of text categorization.
作者
李梅
LI Mei(School of Information Engineering,Huainan Union University,Huainan,Anhui Province 232001)
出处
《楚雄师范学院学报》
2020年第3期115-119,共5页
Journal of Chuxiong Normal University
基金
安徽省高等学校省级自然科学研究项目(NO:KJ2019A0456)
安徽省高等学校省级自然科学研究项目(NO:KJ2019A0664)
安徽省高等学校省级自然科学研究项目(NO:KJ2017A585)。