期刊文献+

新冠文本实体关系抽取及数据集构建方法研究 被引量:1

Research on COVID-19 Text Entity Relation Extraction and Dataset Construction Methods
下载PDF
导出
摘要 实体关系抽取可有效地获取文本中的关键信息,利用新冠文本中的关键信息有助于切断疫情传播途径,发掘疫情传播源头。但该领域没有适合的公开有标注的数据集,针对该问题,通过分析新冠文本的语义表示和结构特点,提出一种针对新冠文本的实体关系定义,并根据实体关系定义对收集的数据进行实体标注和关系标注,在标注完成后,通过数据预处理等操作生成新冠文本实体关系抽取数据集。与公开数据集相比,该领域的数据集本文实体和关系分布较为密集,单一神经网络模型特征抽取能力较差,因此采用多种神经网络模型拼接的方法构建命名实体识别模型和关系抽取模型。通过模型的结果对数据集进行实验验证,实验结果证明该数据集可以应用于该领域的实体关系抽取任务。 Entity relationship extraction can effectively obtain key information in the text,and using the key information in the COVID-19 text can help cut off the transmission route of the epidemic and discover the source of the epidemic.However,there is no suitable public annotated dataset in this field.To solve this problem,by analyzing the semantic representation and structural characteristics of the COVID-19 text,an entity relationship definition for the COVID-19 text is proposed,and the collected data is analyzed according to the entity relationship definition.Entity annotation and relationship annotation,after the annotation is completed,through data preprocessing and other operations to generate a COVID-19 text entity relationship extraction dataset.Compared with public datasets,the datasets in this field have denser distribution of entities and relationships,and the feature extraction capability of a single neural network model is poor.Therefore,a method of splicing multiple neural network models is used to construct a named entity recognition model and a relationship extraction model.The data set is experimentally verified by the results of the model,and the experimental results prove that the data set can be applied to the entity relation extraction task in this field.
作者 杨崇洛 生龙 魏忠诚 王巍 YANG Chongluo;SHENG Long;WEI Zhongcheng;WANG Wei(College of Information and Electrical Engineering,Hebei University of Engineering,Handan,Hebei 056038,China;Hebei Key Laboratory of Security Protection Information Sensing and Processing,Hebei University of Engineering,Handan,Hebei 056038,China)
出处 《计算机工程与应用》 CSCD 北大核心 2023年第8期97-104,共8页 Computer Engineering and Applications
基金 国家自然科学基金(61802107) 河北省高等学校科学技术研究项目(QN2020193,ZD2020171)。
关键词 数据集 实体关系定义 数据标注 双向循环神经网络 卷积神经网络 dataset entity and relationship definition data labeling bidirectional recurrent neural network convolutional neural network
  • 相关文献

参考文献9

二级参考文献61

共引文献319

同被引文献3

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部