摘要
实体关系抽取是自然语言处理领域知识图谱构建的关键技术之一,有助于知识图谱自动化更新和扩充,并为下游任务提供重要的知识库支持。目前实体关系抽取方法大多从单一角度进行特征提取,导致特征表达能力不足,同时级联错误累积现象严重,无法较好针对实体关系重叠、实体嵌套现象进行适配,极大地影响实体关系抽取的精度和效率。为了同时解决这些问题,提出了一种融合语义和依存句法信息的实体关系联合抽取方法。该方法采用预训练语言模型BERT提取语义特征;然后利用句法注意力图卷积神经网络获取依存句法特征;最终,融合语义特征和依存句法特征对句子中多个关系的主客实体位置进行预测标记。实验结果表明,所提模型在NYT和WebNLG公共数据集上的F1值分别达到了92.8%和91.1%,与基线模型和其他深度学习模型相比,模型在重叠实体抽取上取得了较好的效果,验证了模型的有效性。
Entity relation extraction is one of the key task of knowledge graph construction in natural language processing.It helps to update and expand the knowledge graph automatically,and provides important knowledge base support for downstream tasks.At present,most entity relationship extraction methods extract features from a single perspective,resulting in insufficient feature expression ability.Meanwhile,the accumulation of cascading errors is severe,making it difficult to adapt well to the phenomenon of overlapping and nested entity relationships,greatly affecting the accuracy and efficiency of entity relationship extraction.To solve these problems at the same time,we propose a new joint entity relation extraction method that combines semantic and dependency syntactic information.First,pre-trained language model BERT is used to extract semantic features.Then,syntactic attention graph convolutional network is used to obtain syntactic features of fusion dependency information.Finally,dependency syntactic features and semantic features are combined to predict the position of subject and object entities in multiple relationships in a sentence.Experimental results show that the F1 value of the proposed model on NYT and WebNLG public data sets reaches 92.8%and 91.1%respectively.Compared with the baseline model and other deep learning models,the proposed model achieves better results in overlapping entity extraction,which verifies its effectiveness.
作者
胡翼
于海
郭鑫
陈千
廖健
郑建兴
李艳红
杨可涵
HU Yi;YU Hai;GUO Xin;CHEN Qian;LIAO Jian;ZHENG Jian-xing;LI Yan-hong;YANG Ke-han(School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China;China Mobile Communications Group Shanxi Co.,Ltd.,Taiyuan 030024,China)
出处
《计算机技术与发展》
2024年第8期93-100,共8页
Computer Technology and Development
基金
国家自然科学基金(62076158)
山西省自然科学基金(20220302122021,20210302123468,202203021221001)
CCF智谱AI大模型基金(CCF-Zhipu202310)。
关键词
关系抽取
句法依存分析
图卷积神经网络
特征融合
关系重叠
relation extraction
syntactic dependency analysis
graph convolution neural network
feature fusion
relationship overlap