摘要
近年来,知识库(Knowledge Base,KB)被广泛应用于问答(Question Answering,QA)任务中。给定自然语言问题,利用知识库为给定问题提供正确答案,被称为KBQA问题。然而,知识库本身可能是不完整的(例如,KB不包含问题的答案或问题中的一些实体和关系),这限制了现有KBQA模型的总体性能。为了解决这个问题,文中提出了一个新的模型,利用文本语料库信息提供额外信息来增强知识库覆盖率和背景信息以增强问题的表示。具体来说,该模型由3个模块组成,即实体和问题表征模块、文档和问题增强表征模块以及答案预测模块。实体和问题表征模块从检索到的知识库子图中学习实体的表示,然后通过融合种子实体信息更新问题表示;文档和问题增强表征模块尝试学习与给定问题相关文档的正确表示,然后通过融合文档信息进一步改进问题表示;最后,答案预测模块根据知识库实体表征、文档表征和更新的问题表征进行答案预测。利用所提方法在WebQuestionsSP数据集上进行了大量的实验,结果表明,与其他方法相比,所提方法可以获得更高的准确性。
Recently,knowledge base(KB)has been widely adopted to the task of question answering(QA)to provide a proper answer for a given question,known as the KBQA problem.However,knowledge base itself may be incomplete(e.g.KB does not contain the answer to the question,or some of the entities and relationships in the question),limiting the overall performance of existing KBQA models.To resolve this issue,this paper proposes a new model to leverage textual documents for KBQA task by providing additional answers to enhance knowledge base coverage and background information to enhance the representation of questions.Specifically,the proposed model consists of three modules,namely entity and question representation module,document and enhanced-question representation module and answer prediction module.The first module aims to learn the representations of entities from the retrieved subgraph of knowledge base.Then,the question representation can be updated with the fusion of seed entities.The second module attempts to learn a proper representation of the document that is relevant to the given question.Then,the question representation can be further improved by fusing the document information.Finally,the last module makes an answer prediction based on the information of knowledge base,updated question and documents.Extensive experiments are conducted on the WebQuestionsSP dataset,and the results show that better accuracy can be obtained in comparison with other counterparts.
作者
冯程程
刘派
姜琳颖
梅笑寒
郭贵冰
FENG Chengcheng;LIU Pai;JIANG Linying;MEI Xiaohan;GUO Guibing(School of Software,Northeastern University,Shenyang 110000,China;School of Engineering,Westlake University,Hangzhou 310000,China;School of Software,University of Maryland,Maryland MD20740,USA)
出处
《计算机科学》
CSCD
北大核心
2023年第3期266-275,共10页
Computer Science
基金
国家自然科学基金(61972078)
沈阳市科技计划项目(21-108-9-19)。