摘要
针对长文本语义匹配中词向量前后之间联系不易捕获以及主题信息可能不唯一,通常使得语义匹配效果不佳的问题,提出了一种基于BERT与密集复合网络的长文本语义匹配方法,通过BERT嵌入与复合网络的密集连接,显著提高了长语义匹配的准确率。首先,将句子对输入BERT预训练模型,通过迭代反馈得到精准的词向量表示,进而得到高质量的句子对语义信息。其次,设计了一种密集复合网络,先由双向长短期记忆网络(Bi-LSTM)获得句子对的全局语义信息,然后由TextCNN提取并整合局部语义信息得到每个句子的关键特征和句子对间的对应关系,并将BERT与Bi-LSTM的隐藏输出与TextCNN的池化输出融合。最后,汇总训练过程中网络之间的关联状态,可以有效防止网络退化和增强模型判断能力。实验结果表明,在社区问题回答(CQA)长文本数据集上,本文方法平均提升幅度达到45%。
In the semantic matching of long texts,it is challenging to capture the before-and-after connections and topic information,which often results in poor semantic matching.This paper proposes a long text semantic matching method based on BERT and dense composite network.Through the dense connection of BERT embedding and composite network,the accuracy of long semantic matching is significantly improved.First,the sentence pair is input into the BERT pre-training model,and accurate word vector representation is obtained through iterative feedback,and then high-quality sentence pair semantic information is obtained.Secondly,a dense composite network is designed.Bi-LSTM first obtains the global semantic information of sentence pairs,and then TextCNN extracts and integrates local semantic information to obtain the key features of each sentence and the correspondence between sentence pairs,and the BERT Fusion with the hidden output of Bi-LSTM and the pooled output of TextCNN.Finally,summarizing the association state between networks during the training process can effectively prevent network degradation and enhance the model’s judgment ability.The experimental results show that on the community question answering(CQA)long text dataset,the method in this paper has a significant effect,with an average improvement of 45%.
作者
陈岳林
高铸成
蔡晓东
CHEN Yue-lin;GAO Zhu-cheng;CAI Xiao-dong(School of Mechanical and Electrical Engineering,Guilin University of Electronic Technology,Guilin 541000,China;School of Information and Communication,Guilin University of Electronic Technology,Guilin 541000,China)
出处
《吉林大学学报(工学版)》
EI
CAS
CSCD
北大核心
2024年第1期232-239,共8页
Journal of Jilin University:Engineering and Technology Edition
基金
广西创新驱动发展专项项目(桂科AA20302001)。