期刊文献+

基于主题词频数特征的文本主题划分 被引量:11

New text categorization method based on the frequency of topic words
下载PDF
导出
摘要 目前文本分类所采用的文本—词频矩阵具有词频维数过大和过于稀疏两个特点,给计算造成了一定困难。为解决这一问题,从用户使用搜索引擎时选择所需文本的心理出发,提出了一种基于主题词频数特征的文本主题划分方法。该方法首先根据统计方法筛选各文本类的主题词,然后以主题词类替代单个词作为特征采用模糊C-均值(FCM)算法施行文本聚类。实验获得了较好的主题划分效果,并与一种基于词聚类的文本聚类方法进行了过程及结果中多个方面的比较,得出了一些在实施要点和应用背景上较有意义的结论。 The word frequency matrix currently used in text categorization is characterized with high dimensionality and excessive sparsity. These two features caused some difficulties to computing. To solve this problem, according to the search engine users' selections, a new text categorization method based upon the feature of topic words frequency was proposed. This approach was designed to filter new concept topic words by statistical method, and then the FCM clustering algorism was applied to the documents, using the frequency of topic words rather than the frequency of single word as the feature. This method performs well in the experiment. Furthermore, this method was compared in many aspects with a text categorization method based on keyword qlusters, and some useful conclusions about implementation and application were reached.
出处 《计算机应用》 CSCD 北大核心 2006年第8期1993-1995,共3页 journal of Computer Applications
基金 厦门大学985二期信息创新平台项目资助(0000-X07204)
关键词 搜索引擎 文本聚类 模糊C-均值 主题词筛选 search engine document clustering Fuzzy C-Means(FCM) topic word filtering
  • 相关文献

参考文献7

  • 1DEERWESTER S,DUMAIS ST,LANDAUER TK,et al.Indexing by latent semantic analysis[J].Journal of Society for Information Science,1990,41 (6):391-407.
  • 2CHANG H-C,HSU C-C.Using topic keyword clusters for automatic document clustering[J].IEEE Transactions on Information and Systems,2005,E88-D(8):1852-1860.
  • 3CHANG HC,HSU CC,DENG YW.Automatic document clustering based on keyword clusters using partitions of weighted undirected graph[A].Proceedings of 2003 Symposium on Digital Life and Intemet Technologies[C].2003.
  • 4HSIEH SM,HUANG SJ,HSU CC,et al.Personal document recommendation system based on data mining techniques[A].Proceedings of 2004 IEEE/WIC/ACM International Joint Conference on Web Intelligence[C].2004.51-57.
  • 5HUANG ZX,MICHAEL KN.A fuzzy k-modes algorithm for clustering categorical data[J].IEEE transactions on fuzzy systems,1999,7(4).
  • 6HUANG ZX.Extensions to the k-means algorithm for clustering large data sets with categorical values[J].Data Mining and Knowledge Discovery,1998,2(3):283-304.
  • 7RICARDO BAEZA-YATES,BERTHIER RIBEIRO-NETO.Modern Information Retrieval[M].ACM Press,1999.

同被引文献113

引证文献11

二级引证文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部