摘要
关于跨文档三元组(Subject Predicate Object,SPO)抽取任务,当前的研究主要基于句子级别的分析。然而很多场景下SPO元素可能分散于文档的各个位置,句子级别的抽取技术远远无法满足需求,因此提出一种Doc2SpSPO联合SPO抽取模型。该模型通过Span候选集模型生成初始实体信息,基于BERT-WWM预训练模型得到上下文以及候选实体相关Embedding信息进行分类任务从而实现SPO的联合提取。实验结果表明,该模型实体识别可达到F1值44.4%、关系分类准确率66.9%的较好效果。
The current research of cross document subject predicate object(SPO)extraction task is mainly based on sentence level analysis.However,in many scenarios,SPO elements may be scattered in various locations of the document,and the current sentence level extraction technology is far from meeting the requirements.Therefore,we propose a Doc2SpSPO joint extraction of SPO model.In this model,the initial entity information was generated by Span candidate set model.Based on the pre-training model of BERT-WWM,the context and candidate entity related embedding information for classification tasks were obtained to achieve joint extraction of SPO.The experimental results show that this model s entity recognition achieved the F1 value of 44.4%and the relationship classification accuracy of 66.9%.
作者
章振增
Zhang Zhenzeng(Linewell Software Co.,Ltd.,Quanzhou 362000,Fujian,China)
出处
《计算机应用与软件》
北大核心
2023年第6期181-186,215,共7页
Computer Applications and Software