期刊文献+

实体关系识别中长距离依赖问题的研究 被引量:2

Research on Long-distance Dependence Problem in Entity-relationship Recognition
下载PDF
导出
摘要 实体关系识别是信息抽取中的关键步骤,传统的词袋模型受到长距离依赖问题的影响,在处理实体关系识别过程中的性能不佳.条件随机场具有灵活的特征表达能力,因此非常适合表示复杂的语言现象.但传统的Linear-Chain CRF仍然不能表示长距离依赖问题,而Skip-Chain CRF仅考虑了相同词的长距离依赖问题,并且由于计算过于复杂,因此很难进行扩展.本文提出了一种新型的全连通随机场模型,使用词的相似度来建立依赖关系和使用词的互信息来删除依赖关系,同时改进了词的相似度计算公式,使其能够表示距离依赖关系,从而在解决长距离语言约束问题上克服了以往统计学习模型的缺陷,并在计算强度上与Linear-Chain CRF大致相当,在实体关系识别中的性能超过了目前的Linear-Chain CRF和Skip-Chain CRF. One of the key challenges of information extract is Entity-Relationship Recognition (ERR) which can not be well dealt by the traditional word-bag model because of the effect of long-distance dependence problem (LDP). Conditional Random Field (CRF) has a good ability to express flexible feature and fit for complex language case. But Linear-Chain CRF still cannot slove the LDP. Skip-Chain CRF only considers LDP about same words and it cannot to be extended easily since it is too complex to compute. This paper proposes full-connected random field to overcome LDP in previous statistics learning model and reach the same complexity with Linear-Chain CRF. FCRF uses similarity of words to build relationship between them and uses average mutual information to delete relationship. The extended similarity formula is used to express the distance between words. Experiment prove FCRF performs better than Linear-Chain CRF and Skip-Chain CRF in ERR task.
出处 《小型微型计算机系统》 CSCD 北大核心 2008年第2期364-367,共4页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(60025206)资助 装备预先研究基金项目资助
关键词 实体关系识别 长距离依赖 全连通随机场 相似度计算 entity-relationship reeognition long-distanee dependenee full-eonneeted random field similarity eompute
  • 相关文献

参考文献9

  • 1Peter F Brown, Vincent J Della Pietra, Peter V deSouza, et al. Class based n-gram models of natural language[J]. Computational Linguistics, 1992,8(4) :467-479.
  • 2Zhou GuoDong, Lua KimTeng. Interpolation of n-gram and mutual information based trigger pair language models for mandarin speech recognition[J]. Computer Speech and Language, 1999,13 (2), 125-141.
  • 3刘秉权,王晓龙,王宇颖.一种多知识源汉语语言模型的研究与实现[J].计算机研究与发展,2002,39(2):231-235. 被引量:8
  • 4陈清才,王晓龙,赵健.一种基于粗糙集的大规模语料库语言学知识发现模型[J].计算机工程与科学,2004,26(5):56-61. 被引量:1
  • 5Lawrence R Rabiner. A tutorial on hidden narkov models and selected applications in speech recognition [J]. Proceedings of the IEEE, 1989, 77(2), 257-286.
  • 6Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data [C]. In: Proc. 18th International Conf. on Machine Learning, 2001.
  • 7Andrew McCallum, Wei Li. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons [C]. In: Seventh Conference on Natural Language Learning (CoNLL), 2003.
  • 8Dong Zhen-dong, Dong Qiang. HowNet [EB/OL]. http:// www. how-net. com.
  • 9Rosenfeld R. A maximum entropy approach to adaptive statistical language modeling[D]. Carnegie Mellon University, 1994.

二级参考文献8

共引文献7

同被引文献12

引证文献2

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部