期刊文献+

基于伪标签置信选择的半监督医疗事件抽取 被引量:2

Semi-supervised medical event extraction based on pseudo-label confidence selection
下载PDF
导出
摘要 医疗事件抽取是构建医疗知识图谱的重要基础.针对医疗领域有标签数据匮乏的问题,构建基于Transformer编码器、BiLSTM和注意力机制的医疗事件联合抽取模型,并提出一种用于选择高置信度数据的伪标签置信选择算法.首先,训练医疗事件联合抽取模型对无标签数据进行预测产生伪标签数据;然后,通过计算伪标签一致概率P来选择高置信度的伪标签数据,将其加入原有数据中重新训练联合抽取模型;最后,使用更新的医疗事件联合抽取模型对电子病历中肿瘤原发部位、病灶大小和转移部位事件进行抽取,并使用多数投票得到最终的抽取结果.以2020年全国知识图谱与语义计算大会(CCKS2020)中面向中文电子病历的医疗事件抽取任务语料作为实验数据,实验结果表明,本文提出方法获得了较好的医疗事件抽取结果. Medical event extraction is an important foundation for constructing medical knowledge graphs.Aiming at the problem of lack of label data in the medical field,a joint extraction model of medical events based on Transformer Encoder,BiLSTM and attention mechanism is constructed,and a pseudo-label confidence selection algorithm for selecting high-confidence data is proposed.Firstly,the medical event joint extraction model is trained to predict unlabeled data and generate pseudo-labeled data.Secondly,,high-confidence pseudo-label data is selected by calculating the pseudo-label consensus probability P,and is added to the original data to retrain the joint extraction model.Finally,the updated medical event joint extraction model is used to extract the primary sites,focus sizes and metastatic sites events in the medical electronic medical records,and use majority voting to obtain the final extraction results.Taking the medical event extraction task corpus for Chinese electronic medical records in the 2020 National Knowledge Graph and Semantic Computing Conference(CCKS2020)as experimental data,the experimental results show that the method proposed in this paper has obtained better medical event extraction results.
作者 梁文桐 朱艳辉 詹飞 张旭 欧阳康 孔令巍 黄雅淋 LIANG Wentong;ZHU Yanhui;ZHAN Fei;ZHANG Xu;OUYANG Kang;KONG Lingwei;HUANG Yalin(School of Computer,Hunan University of Technology,Zhuzhou 412007,China;Hunan Key Laboratory of Intelligent Information Perception and Processing Technology,Zhuzhou 412007,China)
出处 《微电子学与计算机》 2022年第1期71-79,共9页 Microelectronics & Computer
基金 国家自然科学基金(61702177) 湖南省自然科学基金项目(2018JJ2098,2020JJ6089) 湖南省教育厅重点项目(19A133)。
关键词 医疗事件抽取 知识图谱 注意力机制 联合抽取 伪标签 电子病历 多数投票 medical event extraction knowledge graph attention mechanism joint extraction pseudo-label electronic medical record majority voting
  • 相关文献

参考文献2

二级参考文献19

共引文献12

同被引文献15

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部