摘要
命名实体识别是自然语言处理的基础任务之一,目的是从非结构化的文本中识别出所需的实体及类型,其识别的结果可用于实体关系抽取、知识图谱构建等众多实际应用。近些年,随着深度学习在自然语言处理领域的广泛应用,各种基于深度学习的命名实体识别方法均取得了较好的效果,其性能全面超越传统的基于人工特征的方法。该文从三个方面介绍近期基于深度学习的命名实体识别方法:第一,从输入层、编码层和解码层出发,介绍命名实体识别的一般框架;第二,分析汉语命名实体识别的特点,着重介绍各种融合字词信息的模型;第三,介绍低资源的命名实体识别,主要包括跨语言迁移方法、跨领域迁移方法、跨任务迁移方法和集成自动标注语料的方法等。最后,总结相关工作,并提出未来可能的研究方向。
Named entity recognition(NER), as one of the basic tasks in natural language processing, aims to identify the required entities and their types in unstructured text. In recent years, various named entity recognition methods based on deep learning have achieved much better performance than that of traditional methods based on manual features. This paper summarizes recent named entity recognition methods from the following three aspects: 1) A general framework is introduced, which consists of an input layer, an encoding layer and a decoding layer. 2) After analyzing the characteristics of Chinese named entity recognition, this paper introduces Chinese NER models which incorporate both character-level and word-level information. 3) The methods for low-resource named entity recognition are described, including cross-lingual transfer methods, cross-domain transfer methods, cross-task transfer methods, and methods incorporating automatically labeled data. Finally, the conclusions and possible research directions are given.
作者
邓依依
邬昌兴
魏永丰
万仲保
黄兆华
DENG Yiyi;WU Changxing;WEI Yongfeng;WAN Zhongbao;HUANG Zhaohua(School of Software,East China Jiaotong University,Nanchang,Jiangxi 330013,China)
出处
《中文信息学报》
CSCD
北大核心
2021年第9期30-45,共16页
Journal of Chinese Information Processing
基金
国家重点研发计划(2018YFC0831106)
国家自然科学基金(61866012)
江西省自然科学基金(20181BAB202012)
江西省教育厅科学技术研究项目(GJJ180329)。