期刊文献+

基于标签聚类的中文重叠命名实体识别方法 被引量:6

Chinese Overlapping Named Entity Recognition Method Based on Label Clustering
下载PDF
导出
摘要 为解决命名实体之间的复杂嵌套以及语料库中标注误差导致的相邻命名实体边界重叠问题,提出一种中文重叠命名实体识别方法。利用基于随机合并与拆分的层次化聚类算法将重叠命名实体标签划分到不同的聚类簇中,建立文字到实体标签之间的一对一关联关系,解决了实体标签聚类陷入局部最优的问题,并在每个标签聚类簇中采用融合中文部首的BiLSTM-CRF模型提高重叠命名实体的识别稳定性。实验结果表明,该方法通过标签聚类的方式有效避免标注误差对识别过程的干扰,F1值相比现有识别方法平均提高了0.05。 To address complex nested relations between named entities and overlapping boundaries of adjacent named entities caused by mislabeling in corpus,this paper proposes a method of Chinese overlapping Named Entity Recognition(NER).First,a hierarchical clustering algorithm based on random merging and splitting is used to divide the labels of overlapping named entities into different clusters to build one-to-one relations between words and entity labels,which prevents the clustering of entity labels from falling into local optimization.Then,a Bidirectional Long Short Term Memory-Conditional Random Fields(BiLSTM-CRF)model integrating Chinese radicals is used in each label clustering to improve the stability of overlapping NER.Experimental results show that the proposed method can effectively avoid the impact of mislabeling on recognition through label clustering,improving the F1 value by 0.05 compared with the existing methods.
作者 温秀秀 马超 高原原 康子路 WEN Xiuxiu;MA Chao;GAO Yuanyuan;KANG Zilu(Information Science Academy,China Electronics Technology Group Corporation,Beijing 100081,China)
出处 《计算机工程》 CAS CSCD 北大核心 2020年第5期41-46,共6页 Computer Engineering
基金 国家重点研发计划“面向云计算的网络化操作系统”(2016YFB1000500)。
关键词 命名实体识别 实体重叠 中文命名实体 标签聚类 层次化聚类 Named Entity Recognition(NER) entity overlapping Chinese named entity label clustering hierarchical clustering
  • 相关文献

参考文献9

二级参考文献244

共引文献427

同被引文献83

引证文献6

二级引证文献41

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部