摘要
针对现有模型无法充分理解上下文和同时解决不同类型文本主题句自动生成以及生成重复内容的难题,对一种融合关键信息的PGN文本主题句生成方法进行研究。融合句子情感倾向加权特征和TextRank迭代算法筛选关键句;根据不同文本类型进行参数自动配置,利用BERT预训练语言模型对获取的关键句进行向量化表征并输入到融合coverage复制机制的指针生成网络模型中生成主题句;采用后处理技术对生成的主题句内容和长度进行检测与修正得到最终主题句。在公开数据集LCSTS上的实验结果表明,所提模型可以更充分地理解原文并有效减少重复内容的生成,它的Rouge-1和Rouge-L值均高于基线模型。
Existing models are unable to fully understand the context and simultaneously solve the problems of automatic generation of different types of text topic sentence and generation of repetitive content.A method for generating topic sentences in PGN texts that integrated key information was studied.Sentence sentiment weighted features and TextRank iterative algorithm were combined to select key sentences.Parameters were automatically configured according to different text types and the BERT pre-training language model was used to vectorize the obtained key sentences and they were inputted into the PGN model that integrated the coverage replication mechanism to generate topic sentences.Post-processing technology was used to detect and modify the content and length of the topic sentence generated to obtain the final topic sentence.Experimental results on the public dataset LCSTS show that the proposed model can more fully understand the original text and effectively reduce the generation of duplicate content.The Rouge-1 and Rouge-L values are higher than that of the baseline model.
作者
葛斌
何春辉
黄宏斌
GE Bin;HE Chun-hui;HUANG Hong-bin(Science and Technology on Information Systems Engineering Laboratory,National University of Defense Technology,Changsha 410073,China;R&D Department,Hunan Aike Human Resources Service Limited Company,Changsha 410208,China)
出处
《计算机工程与设计》
北大核心
2022年第6期1601-1608,共8页
Computer Engineering and Design
基金
国家自然科学基金项目(71971212、61902417)。
关键词
信息抽取
主题句生成
指针生成网络
迭代算法
复制机制
深度学习
后处理技术
information extraction
topic sentence generator(TSG)
pointer generation network(PGN)
iterative algorithm
copy mechanism
deep learning
post-processing technology