期刊文献+

融合多特征的最大熵汉语命名实体识别模型 被引量:33

Fusion of Multiple Features for Chinese Named Entity Recognition Based on Maximum Entropy Model
下载PDF
导出
摘要 最大熵模型能有效整合多种约束信息,对于汉语命名实体识别具有很好的适用性.因此,将其作为基本框架,提出一种融合多特征的最大熵汉语命名实体识别模型.该模型集成局部与全局多种特征,同时为降低搜索空间并提高处理效率,而引入了启发式知识.基于SIGHAN 2008命名实体评测任务测试数据的实验结果表明,所建立的混合模式是一种组合统计模型与启发式知识的有效汉语命名实体识别模式.基于不同测试数据的实验说明,该方法针对不同测试数据源具有一致性. With the development of natural language processing (NLP) technology, the need for automatic named entity recognition (NER) is highlighted in order to enhance the performance of information extraction systems. The task of NER, which plays a vital role in NLP, is to tag each named entity (NE) in documents with a set of certain NE types. In this paper, a hybrid pattern for Chinese NER based on maximum entropy model is proposed, which fuses multiple features. It differentiates from most of the previous approaches mainly in the following aspects. Firstly, maximum entropy model is an outstanding statistical model for its good integration of various constraints and its compatibility to Chinese NER. Secondly, local features and global features are integrated in the hybrid model to get high performance. Thirdly, in order to reduce the searching space and improve the processing efficiency, heuristic human knowledge is introduced into the statistical model, which could increase the recognition performance significantly. From the experimental results on testing set for NER evaluation task in SIGHAN 2008, it can be concluded that the established hybrid model is an effective pattern to combine statistical model and heuristic human knowledge. And the experiments on another different testing set also confirm the above conclusion, which show that this algorithm has consistence on different testing data sources.
出处 《计算机研究与发展》 EI CSCD 北大核心 2008年第6期1004-1010,共7页 Journal of Computer Research and Development
基金 国家自然科学基金项目(60773124) 上海市科技攻关计划基金项目(07dz15007)~~
关键词 命名实体识别 最大熵模型 局部特征 全局特征 启发式知识 named entity recognition maximum entropy model local feature global feature heuristic human knowledge
  • 相关文献

参考文献12

  • 1Volk Martin, Clematide Simon. Learn-filter-apply-forget mixed approaches to named entity recognition [C]. In: Proc of the 6th Int'l Workshop on Applications of Natural Language for Information Systems. Berlin: Springer, 2001. 153-163.
  • 2Y Z Wu, J Zhao, B Xu. Chinese named entity based on multiple features [C]. Human Language Technology Conference and Conf on Empirical Methods in Natural Language Processing (EMNLP-2005), Vancouver, Canada, 2005.
  • 3H P Zhang, Q Liu, H Zhang, et al. Automatic recognition of Chinese unknown words based on roles tagging [C]. SigHan2002 Workshop Attached with the 19th Int'l Conf on Computational Linguistics, Taipei, 2002.
  • 4周雅倩,郭以昆,黄萱菁,吴立德.基于最大熵方法的中英文基本名词短语识别[J].计算机研究与发展,2003,40(3):440-446. 被引量:62
  • 5O Bender, F J Och, H Ney. Maximum entropy models for named entity recognition [C]. The 7th Conf on Computational Natural Language Learning (CoNLL 2003), Edmonton, Canada, 2003.
  • 6H L Chieu, H T Ng. Named entity recognition with a maximum entropy approach [C]. The 7th Conf on Computational Natural Language Learning (CoNLL 2003), Edmonton, Canada, 2003.
  • 7A Berger, V J Della Pietra, S A Della Pietra. A maximum entropy approach to natural language processing [J]. Computational Linguistics, 1996, 22(1): 39-71.
  • 8Ramaparkhi Adwait. A simple introduction to maximum entropy models for natural language processing [R]. Institute for Research in Cognitive Science Report,.
  • 9J N Darroch, D Ratcliff. Generalized iterative scaling for loglinear models [J]. The Annals of Mathematical Statistics, 1972, 43(5): 1470-1480.
  • 10Y Z Wu, J Zhao, B Xu. Chinese named entity recognition combining a statistical model with human knowledge [C]. The 41st Annual Meeting of the Association for Computational Linguistics (ACL-2003), Sapporo, 2003.

二级参考文献22

  • 1E F T K Sang, W Daelemans, H Déjean et al. Applying system combination to base noun phrase identification. In: Proc of COLING 2000. Saarbrücken, Germany: Morgan Kaufmann Publishers, 2000. 857~863
  • 2周明 .基于语料库的中文最长名词短语的自动抽取.见:计算语言进展与应用.北京,清华大学出版社,1995. 50-55(Zhou Ming. Corpus-based Chinese maximum noun phrase extraction. In: Computer Linguistic Development and Application(in Chinese). Beijing: Tsinghua University Press, 1995. 50-55)
  • 3K W Church. A stochastic parts program and noun phrase for unrestricted test. In: Proc of the 2nd Conf on Applied Natural Language Processing. Austin, TX, USA: Kluwer Academic Publishers, 1988. 136~143
  • 4S P Abney. Parsing by Chunks. In: R C Berwick, S P Abney eds. PrincipleBased Parsing: Computation and Psycholinguistics. Boston, USA: Kluwer Academic Publishers, 1991. 257~278
  • 5L A Ramshaw, M P Marcus. Text chunking using transformation-based learning. In: Proc of the 3rd Workshop on Very Large Corpora. Kluwer Academic Publishers, 1995. 82~94
  • 6A Ratnaparkhi. Learning to parse natural language with maximum entropy models. Machine Learning, 1999, 34(1/2/3): 151~176
  • 7范晓.静态短语和动态短语. 见:三个平面的语法观 .北京:北京语言文化大学出版社,1996(Fan Xiao. Static phrase and dynamic phrase. In: Grammar Concept from Three Sides(in Chinese). Beijing: Beijing Linguistic Culture College Publisher, 1996)
  • 8R Koeling. Chunking with maximum entropy models. In: Proc of CoNLL 2000. Lisbon, Portagal: Lingustic Association for Computation, 2000
  • 9A L Berger, S A D Pietra, V J D Pietra. A maximum entropy approach to natural language processing. Computational Linguistics, 1996, 22(1):39~71
  • 10A L Berger. The improved iterative scaling algorithm: A gentle introduction. School of Computer Science, Carnegin Mellon University, 1997

共引文献61

同被引文献293

引证文献33

二级引证文献328

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部