期刊文献+

基于改进KNN的文本分类方法 被引量:19

Text Categorization Method Based on Improved KNN
下载PDF
导出
摘要 本文针对VSM (向量空间模型)中KNN (K最近邻算法)在文本处理环境下的不足,根据SOM (自组织映射神经网络)理论、特征选取和模式聚合理论,提出了一种改进的KNN文本分类方法。应用特征选取和模式聚合理论以降低特征空间维数。传统的VSM模型各维相同的权重并不适应于文本处理的环境,本文提出应用SOM神经网络进行VSM模型各维权重的计算。结合两种改进,有效地降低了向量空间的维数,提高了文本分类的精度和速度。 In view of the inadequacy of K nearest neighborhood (KNN) algorithm in text-processing environment in vector space models,this paper puts forward an improved KNN method of text categorization in accordance with self-organization mapping neutral network theory(SOM),feature selection theory and pattern aggregation theory.This paper employs feature selection theory and pattern aggregation theory to reduce feature space dimension.And because each dimension of VSM models possesses the same weight,which is not suitable for text-processing environment,this paper suggests applying SOM neutral network to calculate the weight of each dimension of VSM models.Combining the two improvements,this paper efficiently reduces the dimensions of vector space and raises accuracy and speed of text categorization.
出处 《情报科学》 CSSCI 北大核心 2005年第4期550-554,共5页 Information Science
基金 国家自然科学基金资助项目 (6 0 2 75 0 2 0 )
关键词 文本分类 特征提取 自组织神经网络 向量空间模型 K最近邻算法 模式聚合 text categorization feature selection SOM network vector space model KNN pattern aggregation
  • 相关文献

参考文献7

  • 1周水庚,关佶红,胡运发.隐含语义索引及其在中文文本处理中的应用研究[J].小型微型计算机系统,2001,22(2):239-243. 被引量:41
  • 2Yang Y, Hu X. A re- examination of text categorization methods [ A]. Proceedings 22^nd Annual International ACM SIGIR Confetence on Research and Develolanent in Information Retrieval(SIGIR '99)[C]. Berkeley: ACM Press, 1999.42-49.
  • 3张晓辉,李莹,王华勇,赵宏.应用特征聚合进行中文文本分类的改进KNN算法[J].东北大学学报(自然科学版),2003,24(3):229-232. 被引量:60
  • 4Yah Qiu Chen; Nixca, M. S.; Damper, R. I. Implementing the k - nearest neighbour rule via a neural network[A]. Neural Netwodm, 1995 [ C ]. Proceedings., IEEE.International Coderence on, 1995. 136- 140.
  • 5Soucy, P.; Mineau, G. W. A simple KNN algorithm for text categodzation[ A]. Data Mining, 2001. ICDM 2001[C], Proceedings IEEE International Codeaevce on, 2001.647-648.
  • 6徐建锁 王正欧.一种基于Kohonen网络和模式聚合理论的高效文本分类新方法[R].天津:天津大学系统工程研究所,2004..
  • 7Yang Y, Pedersen JP. A comparative study on feature selection in text categorization[ A]. Proceedings of the Fourteenth Intematlonal Confemnce on Machine Learning (ICML'7)[C]. San Francisco: Morgan Ksufmann Publishers, 1997.412 - 420.

二级参考文献11

  • 1Young P,学位论文,1994年
  • 2Yang Y, Liu X. A re-examination of text categorization methods[A]. Proceedings, 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'99)[C]. Berkeley: ACM Press, 1999.42-49.
  • 3Han E H, George K, Vipin K. Text categorization using weight adjusted k-nearest neighbor classification[R]. Technical Report#00-046, University of Minnesota, 2000.
  • 4Joachims T. Advances in kernel methods-support vector learning[M]. Cambridge MA:MIT Press, 1998.169-184.
  • 5Joachims T. Optimizing search engines using clickthrough data[A]. The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2002)[C]. Edmonton: ACM Press, 2002.102-110.
  • 6He J, Tan A H, Tan C L. A comparative study on chinese text categorization methods[A]. Proceedings of the International Workshop on Text and Web Mining[C]. Singapore: Melbourne, 2000.24-35.
  • 7Yang Y, Pedersen J P. A comparative study on feature selection in text categorization[A]. Proceedings of the Fourteenth International Conference on Machine Learning (ICML'97)[C]. San Francisco: Morgan Kaufmann Publishers, 1997.412-420.
  • 8Slaton G. Automatic text processing: the transformation, analysis, and retrieval of information by computer[M]. MA: Addison-Wesley Publishing Co, 1989.202-220.
  • 9李晓黎,刘继敏,史忠植.概念推理网及其在文本分类中的应用[J].计算机研究与发展,2000,37(9):1032-1038. 被引量:57
  • 10刁倩,张惠惠,王永成,何骥.中文文献自动分类中的知识库构造及其仿人算法[J].情报学报,2000,19(3):248-253. 被引量:5

共引文献98

同被引文献150

引证文献19

二级引证文献107

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部