摘要
基于神经网络的文本蕴含识别模型通常仅从训练数据中学习推理知识,导致模型泛化能力较弱。提出一种融合外部语义知识的中文知识增强推理模型(CKEIM)。根据知网知识库的特点提取词级语义知识特征以构建注意力权重矩阵,同时从同义词词林知识库中选取词语相似度特征和上下位特征组成特征向量,并将注意力权重矩阵、特征向量与编码后的文本向量相结合融入神经网络的模型训练过程,实现中文文本蕴含的增强识别。实验结果表明,与增强序列推理模型相比,CKEIM在15%、50%和100%数据规模的CNLI训练集下识别准确率分别提升了3.7%、1.5%和0.9%,具有更好的中文文本蕴含识别性能和泛化能力。
The textual entailment recognition model based on neural network learns inference knowledge only from training data,which leads to the weak generalization ability of the model.This paper proposes a Chinese Knowledge Enhanced Inference Model(CKEIM)fused with external semantic knowledge.Based on the features of the HowNet knowledge base,the features of word-level semantic knowledge are extracted to construct an attention weight matrix.At the same time,the semantic similarity features of words and hyponymy features are selected from the CiLin knowledge base of synonyms to form the feature vector.Finally,the attention weight matrix,the feature vector and the encoded text vectors are integrated into the training of the neural network model to implement enhanced recognition of Chinese textual entailment.Experimental results show that compared with the Enhanced Sequential Inference Model(ESIM),CKEIM improves the recognition accuracy by 3.7%,1.5%and 0.9%respectively on CNLI training sets of 15%,50%and 100%data scales,which demonstrates that it has better Chinese textual entailment recognition performance and generalization ability.
作者
李世宝
李贺
赵庆帅
殷乐乐
刘建航
黄庭培
LI Shibao;LI He;ZHAO Qingshuai;YIN Lele;LIU Jianhang;HUANG Tingpei(College of Oceanography and Space Informatics,China University of Petroleum(East China),Qingdao,Shandong 266580,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2021年第1期44-49,共6页
Computer Engineering
基金
国家自然科学基金(61972417,61872385)
中央高校基本科研业务费专项资金(18CX02134A,19CX05003A-4,18CX02137A)。
关键词
中文文本蕴含
自然语言推理
注意力机制
双向长短期记忆网络
知网
词林
Chinese textual entailment
natural language inference
attention mechanism
Bi-directional Long Short-Term Memory(BiLSTM)network
HowNet
CiLin