摘要
命名实体间关系的抽取是信息抽取中的一个重要研究问题,该文提出了一种从大量的文本集合中自动抽取命名实体间关系的方法,找出了所有出现在同一句子内、词语之间的距离在一定范围之内的命名实体对,把它们的上下文转化成向量。手工选取少量具有抽取关系的命名实体对,把它们作为初始关系的种子集合,通过自学习,关系种子集合不断扩展。通过计算命名实体对和关系种子之间的上下文相似度来得到所要抽取的命名实体对。通过扩展关系种子集合的方法,抽取的召回率和准确率都得到了提高。该方法在对《人民日报》语料库的测试中,取得了加权平均值F-Score为0.813的效果。
Named entity relation extraction is an important issue in inforlnation extraction, This paper proposes a special method that extracts named entity relation from large text rendezvous. It finds out the named entity pairs, which appear in the same sentences and the distances of them is under a certain value, and converts their contexts into vectors. It selects a few named entity pair instances that have the relation wanted to extract and make them as initial relation seed set, The relation seed set is extended automatically in sell-study process. It gets the named entity pairs, which have the relation wanted to extract, by calculating the similarity of context vectors between named entity pairs and relation seed set. By the method of bootstrapping, the recall and precision are enhanced. It verifies the method with the PFR corpora and achieves an average weighted F-Score of 0.813.
出处
《计算机工程》
EI
CAS
CSCD
北大核心
2006年第21期183-184,193,共3页
Computer Engineering
基金
国家自然科学基金资助项目(60442005)
教育部科学技术研究基金资助重点项目(105117)
关键词
命名实体
关系抽取
自学习
Named entity
Relation extraction: Self-study