摘要
采用本体学习的方法,以百度百科植物类词条内容的非结构和半结构化中文文本信息作为语料进行处理。使用一种有指导的基于依存句法分析的词汇-语法模式来获取植物领域的概念、分类和非分类关系,并分别利用基于词表过滤的方法和给模式添加限制的方法,较大程度地提高了关系抽取的精确度,完成在轻量级本体的基础上自动构建重量级本体。该方法建立了一个特定领域语料的概念层次,提高了最具代表性的分类和非分类关系的发现,并使用OWL语言形式化表达抽取结果。实验表明,该方法在非分类关系抽取上取得了较好的结果,为该领域知识图谱构建奠定了基础。
In order to provide more specific knowledge and technology of plant field, the main task of KG (knowledge graph) is to extract a wealth of concepts and relationships. Due to the relation extraction is the most difficult in KG construction, this paper makes use of ontology learning, and proposes a non- taxonomic relation learning method to obtain representative concepts and their relations from unstructured and semi-structured texts of Baidu Encyclopedia entry content by using lexicon-syntactic patterns based on dependency grammar analysis. Moreover, the methods of adding constraint models and words filtering were adopted to build heavy weight ontology automatically based on a lightweight ontology and greatly improved the precision of the relation extraction. The approach established a concept structure from the plant domain corpus, ameliorated the discovery of the most representative non-taxonomic relation, and formalized them in the standardized OWL 2.0. A set of experiments was performed using the approach implemented in the plant domain. The results indicated that extraction by patterns should be performed directly after natural language processing, which has a comparatively high accuracy compared to the former algorithms, and this approach can extract non-taxonomic relations with high effectiveness, which lays the foundation for KG construction of plant field.
出处
《农业机械学报》
EI
CAS
CSCD
北大核心
2016年第9期278-284,共7页
Transactions of the Chinese Society for Agricultural Machinery
基金
国家自然科学基金项目(61503386)
关键词
植物领域本体
知识图谱
非分类关系
本体学习
百度百科
plant domain ontology
knowledge graph
non-taxonomic relation
ontology learning
Baidu Encyclopedia