摘要
针对由领域本体匹配产生的文本特征矩阵,描述了一种基于改进相似度计算公式的文本资料聚类算法。实验证明,当我们以生物医药领域的文本作为实验样本时,不管是从熵值还是从纯度来考虑,基于领域本体改进的聚类算法都要优于K-means算法和凝聚层次聚类算法。
This paper describes a new clustering method for texts based on an improved similarity calcula tion formula for text feature matrix which is generated by domain ontology matching.The experiment shows that: when they use texts in the field of bio-medicine as the experimental samples,the new cluster ing method for texts based on an improved similarity calculation formula is better than the K-means clus tering method and agglomerative hierarchical clustering method from entropy and purity considerations.
出处
《情报科学》
CSSCI
北大核心
2013年第6期129-134,共6页
Information Science
基金
国家自然科学基金(71201052)
湖南大学青年教师基金项目
关键词
文本挖掘
相似度
聚类
语义
text mining
similarity
clustering
semantics