摘要
长文本武侠小说中主人公以侠客和义士为主,人物个性鲜明,外号可以概括人物最显著的特征。传统命名实体识别主要集中在人名、地名、机构名等领域,对于识别外号尚未有相关研究,但作为武侠小说中不可或缺的元素,外号识别对于同义词识别等研究方向具有借鉴意义。鉴于此,该文提出对武侠小说中武侠人名对应的外号的未登录词扩展识别筛选并辅以固定句式法则的识别方法。未登录词扩展识别筛选方法融合了对于左邻字符串的拓展和筛选同时定义了竞争外号子串和候选外号子串等概念,固定句式法则方法是通过外号指示词对观察窗口的候选外号子串进行筛选。经过统计和分类提出了武侠小说高频词表和低频指示字典,用于对竞争外号子串进行筛选。实验证明该文方法可行有效。
In the full-length knight-errant novels,the protagonists are dominated by knights and martyrs with distinct characters.The nickname can summarize the most prominent features of the characters.To recognize such nicknames,this paper proposes a method combing OOV extension recognition and screening method and syntax patterns.OOV extension recognition and screening method combines the expansion and screening of the left-neighbor strings.The syntaxs pattern are performed to identify candidate nickname substrings of the observation window using nickname indicator.This paper also defines concepts such as candidate nickname substrings and optional nickname substrings.The high frequency word list of the martial arts novels and low-frequency pointer dictionary are derived from statistics and classification,The results show that this method is feasible and effective.
作者
唐锋
梁循
赵晓磊
张旋
程恒超
TANG Feng;LIANG Xun;ZHAO Xiaolei;ZHANG Xuan;CHENG Hengchao(School of Information,Renmin University of China,Beijing 100872,China)
出处
《中文信息学报》
CSCD
北大核心
2019年第8期132-142,共11页
Journal of Chinese Information Processing
基金
北大方正集团有限公司数字出版技术国家重点实验室开放课题
国家自然科学基金(71531012,71271211)
北京市自然科学基金(4172032)
中国人民大学科学研究基金(中央高校基本科研业务费专项资金)项目成果(19XNH120)
关键词
外号识别
竞争外号子串
高频词表
固定句式法则
nickname recognition
competent nickname substring
high frequency word list
fixed sentence principle