摘要
针对目前机器学习方法在化学领域的资源实体及关系抽取任务上召回率低以及高度依赖人工特征工程和领域知识的问题,提出一种基于实体信息及关系信息融合标注的联合抽取方法(Information Fusion Tagging-Joint Model,IFT-Joint)。该方法主要从以下两个方面改进:将联合抽取任务转化为序列标注问题,缓解联合抽取中重叠关系的问题;从序列标注的角度出发,提出一种基于BERT(Bidirectional Encoder Representations from Transformers)联合抽取模型。通过多组实验表明,在化学领域实体数据集上,IFT-Joint的召回率可达到75%以上,相比于所提到的其他方法效果提升明显,且具有良好的稳定性。
In the extraction of entities and relations in the chemical domain,based on the method of machine learning,the low recall rate and high reliance on human-defined features and domain knowledge are two problems still need conquering.Hence,to tackle these two problems,this paper proposes a joint extraction method based on the entities and relations information fusion tagging-joint model(IFT-Joint).This method has been modified from two aspects:A novel tagging scheme was proposed to convert the joint extraction task to a task in sequence labeling,which reduced the overlapping relations in joint extraction;From the view of sequence labeling,this paper presented an entity and relationship joint extraction based on BERT.The experiments show that,in processing the datasets in chemical domain,the recall rate of IFT-Joint can be increased to over 75%,which is significantly improved than other methods mentioned in this paper.Besides,the results also prove the stability and feasibility of IFT-Joint.
作者
马建红
魏字默
陈亚萌
Ma Jianhong;Wei Zimo;Chen Yameng(School of Artificial Intelligence and Data Science,Hebei University of Technology,Tianjin 300401,China)
出处
《计算机应用与软件》
北大核心
2021年第7期159-166,共8页
Computer Applications and Software
基金
河北省科技厅创新软件设计及公共应用服务平台项目(15240118D)。
关键词
信息融合标注
联合抽取
序列标注
重叠关系
BERT
Information fusion tagging
Joint extraction
Sequence labeling
Overlapping relations
BERT