期刊文献+

自动文本分类中两种文本表示方式的比较 被引量:6

Comparison of Two Text Representation Methods
下载PDF
导出
摘要 以路透社财经新闻语料库为实验数据集,比较了主流文本表示方式BOW和独立于语言的字符串表示方式n-Gram,在k近邻和支持向量机分类器下的分类效果,得出了上述两种不同文本表示方式的分类结果之间不存在显著差异的结论。 To compare the impact of two different text representation methods on the performance of support vector machines and k-nearest neighbor classifiers extensive experimental, studies are conducted on Reuters-21578 datasets. Statistical analysis of these experimental results shows that there is no significant difference between the two text representation methods.
出处 《计算机工程》 CAS CSCD 北大核心 2004年第18期124-126,共3页 Computer Engineering
关键词 文本分类 文本表示 支持向量机 K近邻 秩和检验 Text categorization Text representation Support vector machines Knearest neighbor Rank sum test
  • 相关文献

参考文献11

  • 1Salton G,McGill M J.An Introduction to Modern Information Retrieval.McGraw-Hill,1983
  • 2Tan Chew Lim,Sung Sam Yuan,Yu Zhaohui,et al.Text Retrieval from Document Images Based on N-Gram Algorithm.http://citeseer.nj.nec.com/400555.html.
  • 3Vapnik V.The Nature of Statistical Learning Theory.New York:Springer-Verlag,1995
  • 4Lewis D.Reuters-21578,Distribution 1.0.http://www.research.att.com/-lewis/reuters21578.html.
  • 5Yang Yiming,Pedersen J O.A Comparative Study on Feature Selection in Text Categorization.In Machine Learning:Proceedings of the Fourteenth International Conference (ICML'97),1997
  • 6Yang Yiming,Liu Xu.A Re-evaluation of Text Categorization Methods.Proceedings of SIGIR-99.22nd ACM International Conference on Research and Development in Information Retrieval,1999
  • 7Ma Junshui,Zhao Yi ,Ahalt S.OSU SVM Classifier Matlab Toolbox (Ver.3.00).http://eewww.eng.ohio-state.edu/-maj/osu_svm/
  • 8Sebastiani F.Machine Learning in Automated Text Categorization.ACM Computing Surveys,2002,34(1 ):1-47
  • 9Porter M F.An Algorithm for Suffix Striping.Program,1980,14(3):130
  • 10盛骤 谢式千.概率论与数理统计[M].北京:高等教育出版社,1989.189-194.

二级参考文献7

共引文献166

同被引文献82

引证文献6

二级引证文献75

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部