摘要
针对预训练模型在处理新闻这种长文本时会截断一部分文本,导致文本信息缺失的问题,提出一种在融入新闻标题信息基础上将TextRank算法、隐含Dirichlet分布主题模型与预训练模型相结合的方法构建模型,并将该模型与其他语义相似度计算方法进行对比.结果表明,该模型准确率为82.46%,召回率为87.43%,精确率为82.68%,F 1值为84.99%,取得了最优结果,从而有效提高了新闻文本与评论的语义相似度计算性能.
Aiming at the problem that the pre-training model would cut off part of text when dealing with long text such as news,which led to the loss of text infomation,we proposed a method to build a model by combining TextRank algorithm,implicit Dirichlet distribution topic model and pre-training model on the basis of integrating news title information,and compared the model with other semantic similarity calculation methods.The results show that the accuracy rate of the model is 82.46%,the recall rate is 87.43%,the accuracy rate is 82.68%,and the F 1 value is 84.99%,the optimal results are obtained,which effectively improves the performance of semantic similarity calculation between news texts and comments.
作者
李伊仝
王红斌
程良
LI Yitong;WANG Hongbin;CHENG Liang(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650504,China;College of City,Kunming University of Science and Technology,Kunming 650051,China)
出处
《吉林大学学报(理学版)》
CAS
北大核心
2022年第6期1399-1406,共8页
Journal of Jilin University:Science Edition
基金
国家自然科学基金(批准号:61966020)
云南省基础研究计划面上项目(批准号:CB22052C143A)
云南省教育厅科学研究基金(批准号:2018JS035).