摘要
以路透社财经新闻语料库为实验数据集,比较了主流文本表示方式BOW和独立于语言的字符串表示方式n-Gram,在k近邻和支持向量机分类器下的分类效果,得出了上述两种不同文本表示方式的分类结果之间不存在显著差异的结论。
To compare the impact of two different text representation methods on the performance of support vector machines and k-nearest neighbor classifiers extensive experimental, studies are conducted on Reuters-21578 datasets. Statistical analysis of these experimental results shows that there is no significant difference between the two text representation methods.
出处
《计算机工程》
CAS
CSCD
北大核心
2004年第18期124-126,共3页
Computer Engineering
关键词
文本分类
文本表示
支持向量机
K近邻
秩和检验
Text categorization
Text representation
Support vector machines
Knearest neighbor
Rank sum test