摘要
关系抽取是信息抽取技术的重要环节,旨在从无结构的文本中抽取出实体之间的关系.目前基于深度学习的实体关系抽取已经取得了一定的成果,但其特征提取不够全面,在各项实验指标方面仍有较大的提升空间.实体关系抽取不同于其他自然语言分类和实体识别等任务,它主要依赖于句子和两个目标实体的信息.本文根据实体关系抽取的特点,提出了SEF-BERT关系抽取模型(Fusion Sentence-Entity Features and Bert Model).该模型以预训练BERT模型为基础,文本在经过BERT模型预训练之后,进一步提取语句特征和实体特征.然后对语句特征和实体特征进行融合处理,使融合特征向量能够同时具有语句和两个实体的特征,增强了模型对特征向量的处理能力.最后,分别使用通用领域数据集和医学领域数据集对该模型进行了训练和测试.实验结果表明,与其他已有模型相比,SEF-BERT模型在两个数据集上都有更好的表现.
Relation extraction is an important part of information extraction technology,which aims to extract the relationship between entities from unstructured text.At present,entity relationship extraction based on deep learning has achieved certain results,but its feature extraction is not comprehensive enough,and there is still a large space for improvement in various experimental indicators.Entity relationship extraction is different from other tasks such as natural language classification and entity recognition.It mainly depends on the sentence and the information of two target entities.According to the characteristics of entity relationship extraction,this paper proposes the SEF-BERT model(Fusion Sentence-Entity Features and Bert Model).This model is based on the pre-trained BERT model.After the model is pre-trained by the BERT model,sentence features and entity features are further extracted.Then,the sentence feature and the entity feature are fused,so that the fusion feature vector can have the features of the sentence and two entities at the same time,which enhances the model′s ability to process feature vectors.Finally,the model was trained and tested using the data set of the general field and the data set of the medical field.The experimental results show that,compared with other existing models,the SEF-BERT model has better performance on both data sets.
作者
段俊花
朱怡安
邵志运
钟冬
张黎翔
史先琛
DUAN Junhua;ZHU Yian;SHAO Zhiyun;ZHONG Dong;ZHANG LiXiang;SHI Xianchen(School of Computer,Northwestern Polytechnical University,Xi′an 710072,Shaanxi,China;School of Software,Northwestern Polytechnical University,Xi′an 710072,Shaanxi,China)
出处
《微电子学与计算机》
2022年第4期17-23,共7页
Microelectronics & Computer
基金
国家重点研发计划基金(2020YFB1712201)
陕西省重点研发计划(2021ZDLGY05-05)
西安市科技计划(GXYD19.7和GXYD19.8)
工业互联网创新发展工程项目(TC190A3X8-16-1,TC200H038)。