摘要
中医临床本体的建设是中医学国际化的重要组成部分之一,其中对临床术语实体的研究已经颇有成果,但对语义关系的研究却还略有不足。本文提出了一种基于聚类和句法模式相结合的方法对中医临床概念实体之间的语义关系进行研究。通过提取实体周围的特征词,使用Kmeans作为聚类算法,对所有的语料进行第一轮聚类。并以第一轮聚类结果为基础,在同一簇中提取最长公共子序列并泛化作为句法模式,简称句式。根据手工调整过后的句式,自动判断语料中的每一个句子所具有的最能表达语义关系的句式,以句式为特征进行第二轮聚类,该结果即为最终聚类结果。实验结果表明,该方法对语料中存在的语义关系分类的准确率为88.23%。
The construction of clinical ontology of traditional Chinese medicine(TCM) is one of the important components of TCM internationalization. Among them, the study of clinical term entity has been quite successful. But the research on semantic relation is still lacking. This paper presented a method based on the combination of clustering and syntax pattern to study the semantic relations between TCM conceptual entities. By extracting the feature words around the entity, K-means was used as the clustering algorithm to perform the first round of clustering for all corpora. Based on results of the first round of clustering, the longest common subsequence was extracted in the same cluster and generalized as syntax pattern. According to the sentence after manual adjustment, it was automatically judged that each sentence in the corpus has the most suitable syntax pattern of semantic relations, and the second round of clustering is characterized by the syntax pattern. The result was the final clustering result. The experimental results showed that the accuracy of this method was 88.23% for the classification of semantic relations in corpus.
出处
《世界科学技术-中医药现代化》
CSCD
2017年第12期1949-1953,共5页
Modernization of Traditional Chinese Medicine and Materia Medica-World Science and Technology
基金
国家自然科学基金青年科学基金项目:基于本体的中医诊疗信息模型构建研究(81403281)
负责人:曹馨宇
国家科学技术部国家科技支撑计划:‘病症结合’中医药真实世界临床科研方法学研究(2013BAI02B10)
负责人:谢琪
中国中医科学院基本科研业务费:国家中医药数据中心发展战略与建设规划研究(ZZ060815)
负责人:王斌
国家重点研发计划课题:多维多层多态中医药知识图谱及时空演化模型研究(2017YFB1002302)
负责人:李宗友
关键词
中医
语义关系
聚类
句法模式
Chinese medicine, semantic relation, clustering, syntax pattern