摘要
在自然语言处理领域,命名实体识别是信息抽取的第一个关键环节。命名实体识别任务旨在从大量非结构化的文本中识别出命名实体并将其分类为预定义的类型,为关系抽取、文本摘要和机器翻译等自然语言处理任务提供基础支持。首先概述了命名实体识别的定义、研究难点和中文命名实体识别任务的特殊性,总结了命名实体识别任务中常用的中英文公共数据集和评估标准。然后根据命名实体识别的发展历程调研了现有的命名实体识别方法,主要为早期基于规则和词典的命名实体识别方法、基于统计机器学习的命名实体识别方法和基于深度学习的命名实体识别方法。归纳总结了每一种命名实体识别方法的关键思路、优缺点和具有代表性的模型,同时对各阶段的中文命名实体识别方法进行了总结。特别对最新的基于Transformer和基于提示学习的命名实体识别方法进行了综述,这两种细分类的方法是基于深度学习的命名实体识别方法中最先进的方法。最后总结了命名实体识别研究面临的挑战,并展望了未来的研究方向。
In the field of natural language processing, named entity recognition is the first key step of information extraction. Named entity recognition task aims to recognize named entities from a large number of unstructured texts and classify them into predefined types. Named entity recognition provides basic support for many natural language processing tasks such as relationship extraction, text summarization, machine translation, etc. This paper first introduces the definition of named entity recognition, research difficulties, particularity of Chinese named entity recognition, and summarizes the common Chinese and English public datasets and evaluation criteria in named entity recognition tasks. Then, according to the development history of named entity recognition, the existing named entity recognition methods are investigated, which are the early named entity recognition methods based on rules and dictionaries, the named entity recognition methods based on statistic and machine learning, and the named entity recognition methods based on deep learning. This paper summarizes the key ideas, advantages and disadvantages and representative models of each named entity recognition method, and summarizes the Chinese named entity recognition methods in each stage. In particular, the latest named entity recognition based on Transformer and based on prompt learning are reviewed, which are state-of-the-art in deep learning-based named entity recognition methods. Finally, the challenges and future research trends of named entity recognition are discussed.
作者
李冬梅
罗斯斯
张小平
许福
LI Dongmei;LUO Sisi;ZHANG Xiaoping;XU Fu(School of Information Science and Technology,Beijing Forestry University,Beijing 100083,China;Engineering Research Center for Forestry-Oriented Intelligent Information Processing,National Forestry and Grass land Administration,Beijing 100083,China;Institute of Information on Traditional Chinese Medicine,China Academy of Chinese Medical Sciences,Beijing 100700,China)
出处
《计算机科学与探索》
CSCD
北大核心
2022年第9期1954-1968,共15页
Journal of Frontiers of Computer Science and Technology
基金
中央级公益性科研院所基本科研业务费专项资金(ZZ140319-W)
国家自然科学基金(61772078)。
关键词
自然语言处理
命名实体识别
机器学习
深度学习
关系抽取
natural language processing
named entity recognition
machine learning
deep learning
relation extraction