期刊文献+

一种k-NN文本分类器的改进方法 被引量:10

An Improved k-Nearest Neighbor Method in Automatic Text Categorization
下载PDF
导出
摘要 自动文本分类是提高信息利用效率和质量的有效方法。训练文本分布的不均匀会对分类的效果产生负面影响,而在实际中,很难使训练文本的分布达到均匀。针对这一问题,提出了一种改进的k-NN文本分类方法。通过在英文和中文两个文本集的实验表明,改进后的方法不仅分类的准确性有了提高,而且表现出较好的稳定性。 Automatic text categorization is an effective method to increase efficiency and quality of information utilization. The uneven distribution in training set will affect categorization result negatively, while it is uneasy to get even distribution in training set in reality. To this problem, we present an optimized k-NN method and verified its effectiveness by the experiments on both English and Chinese text sets. The classification performance is promoted by the improvement of precision and stability.
作者 巩军 刘鲁
出处 《情报学报》 CSSCI 北大核心 2007年第1期56-59,共4页 Journal of the China Society for Scientific and Technical Information
基金 国家自然科学基金资助项目(No.70371004)
关键词 文本分类 信息检索 K-NN 算法 text classification, information retrieval, k-NN, algorithm
  • 相关文献

参考文献11

  • 1Yang Yiming,Liu Xin.A re-examination of text categorization methods//Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval,1999:42-49.
  • 2Thorsten Joachims.Text categorization with support vector machines:learning with many relevant features//Proceedings of the European Conference on Machine Learning,Berlin,1998:137-142.
  • 3Ricardo Baeza-Yates,Berthier Ribeiro-Neto.Modern Information Retrieval.1999:27-30.
  • 4Sebastiani F.Machine learning in automated text categorization.ACM Computing Surveys,2002,34(1):1-47.
  • 5David D Lewis.Reuters-21578 Text Categorization Test Collection.[2005-12-21].http://www.daviddlewis.com/resources/testcollections/reuters21578/.
  • 6Martin Porter.The Porter Stemming Algorithm.[2005-12-21].http://www.tartarus.org/-martin/PorterStemmer/.
  • 7Yang Y,Pedersen J O.A comparative study on feature selection in text categorization.ICML,1997:412-420.
  • 8唐焕玲,孙建涛,陆玉昌.文本分类中结合评估函数的TEF-WA权值调整技术[J].计算机研究与发展,2005,42(1):47-53. 被引量:26
  • 9Rijsbergen C V.Information Retrieval.London:Butterworths,1979.
  • 10谭松波,王月粉..中文文本分类语料库-TanCorpV1.0..http://lcc.ict.ac.cn/-tansongbo/corpus1.php,,[2005-12-20]..

二级参考文献1

共引文献26

同被引文献86

引证文献10

二级引证文献50

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部