摘要
实体关系识别是信息抽取中的关键步骤,传统的词袋模型受到长距离依赖问题的影响,在处理实体关系识别过程中的性能不佳.条件随机场具有灵活的特征表达能力,因此非常适合表示复杂的语言现象.但传统的Linear-Chain CRF仍然不能表示长距离依赖问题,而Skip-Chain CRF仅考虑了相同词的长距离依赖问题,并且由于计算过于复杂,因此很难进行扩展.本文提出了一种新型的全连通随机场模型,使用词的相似度来建立依赖关系和使用词的互信息来删除依赖关系,同时改进了词的相似度计算公式,使其能够表示距离依赖关系,从而在解决长距离语言约束问题上克服了以往统计学习模型的缺陷,并在计算强度上与Linear-Chain CRF大致相当,在实体关系识别中的性能超过了目前的Linear-Chain CRF和Skip-Chain CRF.
One of the key challenges of information extract is Entity-Relationship Recognition (ERR) which can not be well dealt by the traditional word-bag model because of the effect of long-distance dependence problem (LDP). Conditional Random Field (CRF) has a good ability to express flexible feature and fit for complex language case. But Linear-Chain CRF still cannot slove the LDP. Skip-Chain CRF only considers LDP about same words and it cannot to be extended easily since it is too complex to compute. This paper proposes full-connected random field to overcome LDP in previous statistics learning model and reach the same complexity with Linear-Chain CRF. FCRF uses similarity of words to build relationship between them and uses average mutual information to delete relationship. The extended similarity formula is used to express the distance between words. Experiment prove FCRF performs better than Linear-Chain CRF and Skip-Chain CRF in ERR task.
出处
《小型微型计算机系统》
CSCD
北大核心
2008年第2期364-367,共4页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(60025206)资助
装备预先研究基金项目资助
关键词
实体关系识别
长距离依赖
全连通随机场
相似度计算
entity-relationship reeognition
long-distanee dependenee
full-eonneeted random field
similarity eompute