摘要
机器阅读理解中存在无法仅从给定文档中获取问题答案的特殊情况,为此,基于语义冲突检测的机器阅读理解网络(SCDNet)提出应通过检测问题与文档内容之间的语义分歧来识别这种情况.经分析发现,文档无法为问题提供答案的根本原因主要分为两类:一是文档中不包含问题所需的语义信息;二是二者包含的语义成分之间存在分歧.据此推断,可以通过检测文档语义信息是否全面涵盖问题所需的信息来识别问题是否可由文档信息给出回答.此外,通过在损失函数中加入答案文本长度惩罚项,网络优化目标函数更接近评测指标,系统性能得到提升.网络模型使用联合训练模型建模无答案的问题识别与答案抽取2个子任务,并使用端到端的方式训练.实验结果证明,其对无答案问题类别预测的正确率超过了性能先进的基线模型SAN2.0,在SQuAD2.0数据集上取得了72.43的F1值和76.96的无答案问题识别正确率.
Machine reading comprehension(MRC)with unanswerable questions is challenging to the field of natural language processing research.Unlike previous work which ignores the mechanism of answerable and unanswerable,the semantic conflicts detection-based MRC network(SCDNet)was proposed aiming at detections of no-answer(NA)questions through semantic conflicts detection network.The basic idea is that if the given question is unanswerable,there exists semantic absence or conflicts between the question and the reference passages.Therefore,SCDNet predicts the NA probability by checking whether the passage covers the integral semantics of the question.Besides,in order to extract the exact answer from the passage,SCDNet is applied an answer length penalty in the loss function,which helps the learning objective to be more consistent with the evaluation metrics.SCDNet packs the NA question predictor and the answer extractor in a joint model and is trained in an end-to-end manner.Experiments show that SCDNet performs better than some strong baseline models,and achieve an F1 score of 72.43 and 76.96 NA accuracy on SQuAD 2.0 dataset.
作者
刘咏彬
王小捷
袁彩霞
易炼
LIU Yong-bin;WANG Xiao-jie;YUAN Cai-xia;YI Lian(School of Telecommunication Engineering,Beijing University of Posts and Telecommunications,Beijing 100876,China;Alibaba(Beijing)Software Services Company Limited,Beijing 100022,China)
出处
《北京邮电大学学报》
EI
CAS
CSCD
北大核心
2019年第6期126-133,141,共9页
Journal of Beijing University of Posts and Telecommunications
基金
中央高校基本科研业务费专项资金项目(500419302).
关键词
机器阅读理解
问答系统
无答案的问题
machine reading comprehension
question answering
unanswerable question