期刊文献+

融合关键信息的PGN文本主题句生成方法

PGN text topic sentence generation method based on key information
下载PDF
导出
摘要 针对现有模型无法充分理解上下文和同时解决不同类型文本主题句自动生成以及生成重复内容的难题,对一种融合关键信息的PGN文本主题句生成方法进行研究。融合句子情感倾向加权特征和TextRank迭代算法筛选关键句;根据不同文本类型进行参数自动配置,利用BERT预训练语言模型对获取的关键句进行向量化表征并输入到融合coverage复制机制的指针生成网络模型中生成主题句;采用后处理技术对生成的主题句内容和长度进行检测与修正得到最终主题句。在公开数据集LCSTS上的实验结果表明,所提模型可以更充分地理解原文并有效减少重复内容的生成,它的Rouge-1和Rouge-L值均高于基线模型。 Existing models are unable to fully understand the context and simultaneously solve the problems of automatic generation of different types of text topic sentence and generation of repetitive content.A method for generating topic sentences in PGN texts that integrated key information was studied.Sentence sentiment weighted features and TextRank iterative algorithm were combined to select key sentences.Parameters were automatically configured according to different text types and the BERT pre-training language model was used to vectorize the obtained key sentences and they were inputted into the PGN model that integrated the coverage replication mechanism to generate topic sentences.Post-processing technology was used to detect and modify the content and length of the topic sentence generated to obtain the final topic sentence.Experimental results on the public dataset LCSTS show that the proposed model can more fully understand the original text and effectively reduce the generation of duplicate content.The Rouge-1 and Rouge-L values are higher than that of the baseline model.
作者 葛斌 何春辉 黄宏斌 GE Bin;HE Chun-hui;HUANG Hong-bin(Science and Technology on Information Systems Engineering Laboratory,National University of Defense Technology,Changsha 410073,China;R&D Department,Hunan Aike Human Resources Service Limited Company,Changsha 410208,China)
出处 《计算机工程与设计》 北大核心 2022年第6期1601-1608,共8页 Computer Engineering and Design
基金 国家自然科学基金项目(71971212、61902417)。
关键词 信息抽取 主题句生成 指针生成网络 迭代算法 复制机制 深度学习 后处理技术 information extraction topic sentence generator(TSG) pointer generation network(PGN) iterative algorithm copy mechanism deep learning post-processing technology
  • 相关文献

参考文献12

二级参考文献36

  • 1夏天,樊孝忠,刘林.利用JNI实现ICTCLAS系统的Java调用[J].计算机应用,2004,24(B12):177-178. 被引量:24
  • 2张云涛,龚玲,王永成.基于综合方法的文本主题句的自动抽取[J].上海交通大学学报,2006,40(5):771-774. 被引量:16
  • 3Prasad Pingali, Rahul K and Vasudeva Varma. 2007. IIIT Hyderabad at DUC 2007 [C]//Proceedings of DUC2007.
  • 4Ziheng Lin, Tat-Seng Chua, Min Yen Kan. 2007. NUS at DUC Using Evolutionary Models of Text [C]//Proceedings of DUC 2007. 2007.
  • 5Chin-Yew I.in and Eduard Hovy. 2000. The Automated Acquisition of Topic Signatures for Text Summari zation[C]//Proceedings of the 18th conference on Computational linguistics, Morristown. NJ, USA.. Association for Computational Linguistics.
  • 6John M. Conroy, Judith D. Schlesinger, Dianne P. O'Leary. Topic-Focused Multi-document Summarization Using an Approximate Oracle Score[C]//Pro ceedings of the COLING/ACL 2006 Main Conference Poster Sessions. Association for Computational Linguistics. 2006.
  • 7Tingting He,Wei Shao et al. The implementation of a Query-directed Multi-Document Summarization System [C]//AI.PIT2007. 2007.
  • 8Manning等.统计自然语言处理基础[M].北京:电子工业出版社,2005年.106-110.
  • 9J. Carbonell and J. Goldstein. 1998. The Use of MMR, Diversity-Based Reranking for Reordering Doc- uments and Producing Summaries[C]//Proceedings of SIGIR'98. Melbourne, Australia: 1998.
  • 10The Porter Stemming Algorithm. [EB/OL] tartarus.org/- martin/PorterStemmer/.

共引文献70

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部