摘要
文本分类中特征选择、权重计算及分类算法三个阶段中都存在一些经典方法,在实际的中文文本分类任务中,如何从各阶段不同方法的组合中找到一个好的组合成为值得研究的问题。比较研究中文文本分类中各阶段经典方法的不同组合对分类效果的影响结果表明:采用CHI特征选择方法、TFIDF权重计算方法及SVM分类方法的组合为最佳组合。
Since there are some classic methods in feature selection, weight calculation and classification algorithms in text categorization, therefore, how to find a good combination becomes a problem worthy of study in the actual Chinese text categoriza-tion task.This paper is a comparative study of different combination of classical methods among three steps in Chinese text catego-rization.It is found that text classification obtained high performance, while using CHI feature selection technique, TFIDF weight calculation technique and SVM classify technique in the test, is an effective combination method.
出处
《安庆师范学院学报(自然科学版)》
2015年第2期49-53,共5页
Journal of Anqing Teachers College(Natural Science Edition)
关键词
文本分类
特征选择
权重计算
分类算法
text categorization
feature selection
weight calculation
classifier algorithms