摘要
为弥补现有方法不能很好捕获电子病历实体之间的长距离依赖关系的缺陷,提出一种结合自注意力的BiLSTM-CRF的命名实体识别方法。将输入文本转成神经网络可识别的数值形式;经过BiLSTM网络并结合自注意力计算得到每个字的输出特征向量;通过CRF层找到句子最适合的输出标签序列,从而确定命名实体。采用CCKS2018数据集进行实验,结果表明,改进的命名实体识别方法对电子病历具有一定的适应性,且与现有的方法相比,测试集的准确率提高了6.50~9.25个百分点。
To compensate for the shortcomings of the long-distance dependence between electronic medical record entities that are not well captured by existing methods,this paper proposes named entity identification of BiLSTM-CRF combined with self-attention.The input text was converted into a recognizable numerical form of the neural network;the output feature vector of each word was calculated through the BiLSTM network and combined with self-attention;the CRF layer was used to find the most suitable output tag sequence of the sentence,thereby determining the named entity.The experiments were carried out using the CCKS2018 dataset.The experimental results show that the improved named entity recognition method has certain adaptability to electronic medical records.Compared with the existing methods,the accuracy of the test set is improved by 6.5%~9.25%.
作者
曾青霞
熊旺平
杜建强
聂斌
郭荣传
Zeng Qingxia;Xiong Wangping;Du Jianqiang;Nie Bin;Guo Rongchuan(Qihuang Medical College,Jiangxi University of Traditional Chinese Medicine,Nanchang 330004,Jiangxi,China;Computer School,Jiangxi University of Traditional Chinese Medicine,Nanchang 330004,Jiangxi,China)
出处
《计算机应用与软件》
北大核心
2021年第3期159-162,242,共5页
Computer Applications and Software
基金
国家自然科学基金项目(61762051,61562045)
江西省科技厅重点研发计划项目(20171ACE50021,20171BBG70108)
江西省教育厅科学技术研究项目(GJJ170747)。