期刊文献+

基于概念簇的文本分类算法 被引量:2

Text Classification Algorithm Based on Concept Clusters
原文传递
导出
摘要 针对传统文本分类算法在向量空间模型表示下存在向量高维、稀疏以及忽略特征语义相关性等缺陷所导致的分类效率低和精度不高的问题,以知网(HowNet)为知识库,构建语义概念向量模型SCVM(Semantic Concept Vector Model)表示文本,根据概念语义及上下文背景对同义词进行归并,对多义词进行排歧,提出基于概念簇的文本分类算法TCABCC(Text Classification Algorithm Based on the Concept of Clusters),通过改进传统KNN,用概念簇表示各个类别训练样本,使相似度的计算基于文本概念向量和类别概念簇。实验结果表明,该算法构造的分类器在效率和性能上均比传统KNN有较大的提高。 The traditional text classification algorithms has the problems of high - dimensional, rarefaction and ignoring the semantic correlation of keywords in the vector space model, and it easily leads to low efficiency and poor quality. Taking HowNet as knowledge repository, this paper develops the semantic concept vector model to represent text, merges synonyms and disambiguates polymerizes according to the concept of semantic and the context background. Then it proposes the text classification algorithm of TCABCC based on concept clusters by improving KNN, which uses concept clusters to present training samples of each category, makes similarity calculation based on text concept vector and category concept clusters. The experimental results show that the classifier constructed by this algorithm greatly improves the efficiency and performance than traditional KNN.
出处 《图书情报工作》 CSSCI 北大核心 2013年第15期132-136,82,共6页 Library and Information Service
基金 江苏省教育厅高校哲学社会科学项目"网络资源个性化信息服务模式研究"(项目编号:2012SJD870001)研究成果之一
关键词 文本分类 语义概念向量 概念簇 KNN 知网 text classification semantic concept vector concept cluster KNN HowNet
  • 相关文献

参考文献12

二级参考文献68

共引文献71

同被引文献46

引证文献2

二级引证文献55

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部