摘要
以OHSUMED语料库内提供的明确相关提问对为金标准和研究材料,借助BICOMB软件生成主题词-来源文献矩阵和共词矩阵,并获得各种系数的相似(相异)矩阵,对比分析目前国内基于SPSS共现聚类分析过程中主题词-来源文献矩阵与共现矩阵、各种相似性参数和各种类间距离计算方法的聚类效果。结果表明:主题词-来源文献矩阵聚类结果优于共词矩阵,在聚类分析中应优先选择。共词矩阵选择相似系数时应结合实际矩阵数据性质,并注意聚类方法原理上的正确性。
Similar (differential) matrixes of different coefficients were established by generating subject headingsource literature matrix and co-word matrix using the BICOMB software with the OHSUMED-covered related questions as their golden standard and research material. The clustering effects of SPSS-based subject heading-source literature matrix and co-word matrix, similarity parameters and methods of calculating the distance between different matrixes were comparatively analyzed, which showed that the effect of subject heading-source literature matrix is better than that of co-word matrix and should thus be selected in clustering analysis. Similar coefficients should be selected in combination with the practical matrix data properties in co-occurrence analysis.
出处
《中华医学图书情报杂志》
CAS
2016年第1期52-56,共5页
Chinese Journal of Medical Library and Information Science
关键词
SPSS
聚类分析
共现分析
相关系数
SPSS
Clustering analysis
Co-occurrence analysis
Related coefficient