一种基于LexRank算法的改进的自动文摘系统被引量：15

Automatic Abstracting System Based on Improved LexRank Algorithm

下载PDF

导出

摘要自动文摘是计算机语言学领域的一个研究重点,其研究和应用受到了计算机科学、语言学、情报信息学等相关学科的广泛关注。首先介绍了基于LexRank算法的自动文摘方法。针对该方法的不足,从句子相似度计算方法、句子权重计算方法以及冗余处理等方面对它进行了改进,从而可以根据输入文本内容动态地调整相关影响因子。实现的文摘系统,可以对中文和英文的单文本或多文本进行自动文摘。在哈工大和DUC的测评语料上进行了实验,结果表明该系统在一定程度上改进了文摘的质量,在多文本文摘中的抗噪声方面也有一定的优越性。最后讨论了自动摘要研究存在的问题,并指出了自动文摘的研究趋势。 Automatic abstracting has been a priority research point in computational linguistics field, and the study and application of automatic summarization have widely attracted the attention of interrelated academic subjects such as computer science, linguistics, informatics. I}his article firstly brought out how LexRank algorithm works in automatic summarization, then improved the method in three aspects including sentence similarity computing, sentence weight computing and redundancy resolution. And the factors of influence could be dynamically adjusted according to the documents content. The system described in this article could deal with single or multi-document summarization both in English and Chinese. With evaluations on two corpuses, our methods could produce better summaries than the original LexRank algorithm to a certain degree. We also show that our system is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents. And in the end, existing problem and the developing trend of automatic summarization technology were discussed.

作者纪文倩李舟军巢文涵陈小明

机构地区北京航空航天大学计算机学院

出处《计算机科学》 CSCD 北大核心 2010年第5期151-154,218,共5页 Computer Science

基金国家自然科学基金项目(60573057 60473057 90604007)资助

关键词自动文摘 LexRank 句子相似度动态调整冗余处理 Automatic abstracting LexRank Sentence similarity Dynamic adjustment Redundancy resolution

分类号 TP393 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献20

1Luhn H P. The Automatic Creation of Literature Abstracts[J]. IBM Journal of Research and Development, 1958 : 159-165.
2Edmundson W. Automatic Abstracting and Indexing:Survey and Recommendations[J]. Communication of the ACM, 1961,4 (5): 226-234.
3Edmundson W. New methods in automatic abstracting [J].Journal of the Association for Computing Machinery, 1996,16(2): 264-285.
4Pollock J J, Zamora A. Automatic Abstracting Research at Chemical Abstracts Service[J]. Journal of Chemical Information and Computer Sciences, 1975,15(4) : 226-232.
5Paice C D. The Automatic Generation of Literature Abstracts: An Approach Based on the Identification of Self-indicating Phrases[J]. Information Retrieval Research.
6Schank C, Abelson P. Scripts, Plans, Goals, and Understanding: An Inquiry into Human Knowledge Structures[M]. Hillsdale, New Jersey: Lawrence Erlbaum Associates, 1977.
7Lisa F R, Jacobs P S. SCISOR.. Extracting Information Online News[J]. Communication of the ACM, 1990,33 (11): 88-97.
8Blair-Goldensohn S. Columbia University at DUC 2004[C]//DUC 2004. 2004.
9Gunes E, Radev D R. LexRank: Graph-based Centrality as Salience in Text Summarization [J]. Journal of Artificial Intelligence Research, 2004,22.
10Lin Chin-Yew, Hovy E H. Automatic Evaluation of Summaries Using N-gram Co-oeeurrence Statistics[C]//Proeeeding of 2003 Language Technology Conference (HLT-NAACL 2003). Canada, 2003.

二级参考文献24

1苏海菊,王永成.中文科技文献文摘的自动编写[J].情报学报,1989,8(6):433-439. 被引量：26
2徐永东,徐志明,王晓龙,刘远超.中文文本时间信息获取及语义计算[J].哈尔滨工业大学学报,2007,39(3):438-442. 被引量：10
3J Kupiec. J Pedersen et al. A trainable document summarizer. In: Proc of the 18th Annual Int'l ACM SIGIR Conf on Research and Development in Information Retrieval (SIGIR'95). Seattle, Washington, USA: ACM Press, 1995. 68～73
4R Brandow, K Mitze, L F Rau. Automatic condensation of electronic publication by sentence selection. Information Processing and Management, 1995, 34(5): 575～685
5Radev D R et al.Experiments in single and multiple documents summarization using MEAD//Proceedings of the Document Understanding Conference.New Orleans,2001
6McKeown K,Radev D R.Generating summaries of multiple news articles//Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Seattle,Washington,1995:74-82
7Hardy H et al.Cross-document summarization by concept classification//Proceedings of the Workshop on Text Summarization(DUC 2001).New Orleans,2001:65-69
8Boros E et al.A clustering based approach to creating multidocument summaries//Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New Orleans,LA,2001:34-42
9Yi G,Stylios George.A new multi-document summarization system//Proceedings of the Document Understanding Conference.Edmonton,Canada,2003:102-109
10Radev D R.A common theory of information fusion from multiple text sources step one:Cross-document structure//Proceedings of the 1st ACL SIGDIAL Workshop on Discourse and Dialogue.Hong Kong,China,2000:74-83

共引文献99

1梁媛,王东波,黄水清.面向人民日报语料的新闻自动摘要生成[J].知识管理论坛,2022(4):452-464. 被引量：1
2张清军,朱才连.基于LBS的中文自动文摘技术研究[J].四川大学学报（工程科学版）,2004,36(4):99-102. 被引量：1
3金旭,杨炳儒,菅志刚.自动文摘方法分析[J].计算机应用研究,2004,21(9):5-6. 被引量：8
4胡珀,何婷婷,姬东鸿.基于主题区域发现的中文自动文摘研究[J].计算机科学,2005,32(1):177-181. 被引量：5
5王志琪,王永成,刘传汉.论自动文摘及其分类[J].情报学报,2005,24(2):214-221. 被引量：2
6陈志敏,沈洁,林颖,周峰.基于主题划分的网页自动摘要[J].计算机应用,2006,26(3):641-644. 被引量：8
7赵晶,林鸿飞,卢冶.可视化文本分类树浏览机制[J].小型微型计算机系统,2006,27(3):524-528. 被引量：1
8张云涛,龚玲,王永成.基于综合方法的文本主题句的自动抽取[J].上海交通大学学报,2006,40(5):771-774. 被引量：16
9付克志,林鸿飞.基于N-Level VSM在Web信息检索中的研究[J].计算机工程与应用,2006,42(19):158-160. 被引量：3
10梁循,陈华.在中文学术论文集eBook中生成作者单位索引的方法[J].电脑开发与应用,2006,19(8):55-57.

同被引文献146

1刘挺,吴岩,王开铸.基于信息抽取和文本生成的自动文摘系统设计[J].情报学报,1997,16(S1):31-36. 被引量：13
2柴晓丽,张丽伟,管玉玲.基于HowNet自动文摘的研究[J].电脑编程技巧与维护,2009(S1):164-165. 被引量：1
3陈燕敏,王晓龙,刘远超,楼喜中.一种基于文章主题和内容的自动摘要方法[J].计算机工程与应用,2004,40(33):11-14. 被引量：12
4莫燕,王永成.中文文献摘要的自动编制[J].现代图书情报技术,1993(3):10-12. 被引量：15
5王永成.自动编制文献摘要及知识的自动提取[J].现代图书情报技术,1993(3):13-13. 被引量：1
6张奇,黄萱菁,吴立德.一种新的句子相似度度量及其在文本自动摘要中的应用[J].中文信息学报,2005,19(2):93-99. 被引量：34
7郭庆琳,樊孝忠,柳长安.文本聚类在自动文摘中的应用研究[J].计算机应用,2005,25(5):1036-1038. 被引量：4
8王萌,何婷婷,姬东鸿,王晓荣.基于HowNet概念获取的中文自动文摘系统[J].中文信息学报,2005,19(3):87-93. 被引量：22
9秦兵,刘挺,李生.多文档自动文摘综述[J].中文信息学报,2005,19(6):13-20. 被引量：51
10傅间莲,陈群秀.自动文摘系统中的主题划分问题研究[J].中文信息学报,2005,19(6):28-35. 被引量：13

引证文献15

1梁媛,王东波,黄水清.面向人民日报语料的新闻自动摘要生成[J].知识管理论坛,2022(4):452-464. 被引量：1
2卫佳君,宋继华.自动文摘的方法研究[J].计算机技术与发展,2011,21(8):188-191. 被引量：3
3熊娇,王明文,李茂西,万剑怡.基于词项—句子—文档三层图模型的多文档自动摘要[J].中文信息学报,2014,28(6):201-207. 被引量：6
4李然,张华平,赵燕平,商建云.基于主题模型与信息熵的中文文档自动摘要技术研究[J].计算机科学,2014,41(B11):298-300. 被引量：7
5程园,吾守尔.斯拉木,买买提依明.哈斯木.基于综合的句子特征的文本自动摘要[J].计算机科学,2015,42(4):226-229. 被引量：11
6唐晓波,邱鑫.面向主题的高质量评论挖掘模型研究[J].现代图书情报技术,2015(7):104-112. 被引量：2
7余珊珊,苏锦钿,李鹏飞.基于改进的TextRank的自动摘要提取方法[J].计算机科学,2016,43(6):240-247. 被引量：41
8高永兵,王宇,马占飞.基于CR-PageRank算法的个人事件自动摘要研究[J].计算机工程,2016,42(11):64-69. 被引量：3
9杜秀英.基于聚类与语义相似分析的多文本自动摘要方法[J].情报杂志,2017,36(6):167-172. 被引量：7
10刘海燕,张钰.基于LexRank的中文单文档摘要方法[J].兵器装备工程学报,2017,38(6):85-89. 被引量：5

二级引证文献123

1梁媛,王东波,黄水清.面向人民日报语料的新闻自动摘要生成[J].知识管理论坛,2022(4):452-464. 被引量：1
2黄波,刘传才.基于加权TextRank的中文自动文本摘要[J].计算机应用研究,2020,37(2):407-410. 被引量：21
3于劲松,王海腾,赵廷涛,郭丞皓,梁思远,牛馨皓,边梦葳.TextRank抽取摘要技术在公文服务(OA系统)中的应用[J].办公自动化,2020,25(17):8-10. 被引量：1
4李晓军,李少臣,刘星,姚俊萍.消费者在线评论质量影响因素及组态研究[J].火箭军工程大学学报,2020(1):77-82.
5帅向华,胡素平,刘钦,甄盟.地震灾情网络媒体获取与处理模型[J].自然灾害学报,2013,22(3):178-184. 被引量：17
6李贺,祝琳琳,闫敏,刘金承,洪闯.开放式创新社区用户信息有用性识别研究[J].数据分析与知识发现,2018,2(12):12-22. 被引量：8
7卢兆麟,程若丹,石清吟,王波.基于自然语言处理的汽车造型风格推导与评价[J].汽车工程,2016,38(5):553-560. 被引量：16
8余珊珊,苏锦钿,李鹏飞.基于改进的TextRank的自动摘要提取方法[J].计算机科学,2016,43(6):240-247. 被引量：41
9李满荣,赵宏安,董文静,耿国华,周明全.基于优先权过滤的自动摘要抽取算法[J].西北大学学报（自然科学版）,2017,47(3):349-354. 被引量：1
10杜秀英.基于聚类与语义相似分析的多文本自动摘要方法[J].情报杂志,2017,36(6):167-172. 被引量：7

1熊娇,王明文,李茂西,万剑怡.基于词项—句子—文档三层图模型的多文档自动摘要[J].中文信息学报,2014,28(6):201-207. 被引量：6
2朱兰珍.汉英机器翻译的现状和发展[J].全国商情,2011(12):95-95. 被引量：1
3刘丽.Web数据挖掘及其在数字图书馆中的应用[J].中国信息导报,2003(12):37-38. 被引量：7
4罗钧旻,钟联炯?.计算机语言与计算机语言学[J].西安工业学院学报,1994,14(4):319-323. 被引量：2
5曾哲军.基于连续LexRank的多文本自动摘要优化算法研究[J].计算机应用与软件,2013,30(10):209-212. 被引量：4
6谭翀,陈跃新.自动摘要方法综述[J].情报学报,2008,27(1):62-68. 被引量：9
7刘春崧,王启祥.机器翻译的现状和未来[J].微型计算机,1997,17(1):55-58. 被引量：1
8卢旭.RFID无线传感器网络路由协议研究[J].广东技术师范学院学报,2011,32(9):16-18. 被引量：1
9龙翀,黄民烈,朱小燕,李明.A New Approach for Multi-Document Update Summarization[J].Journal of Computer Science & Technology,2010,25(4):739-749. 被引量：2
10高晶,房俊.基于非完全吸收马尔科夫链的多文档自动文摘算法[J].计算机科学,2013,40(5):201-205.

计算机科学

2010年第5期

浏览历史

内容加载中请稍等...

一种基于LexRank算法的改进的自动文摘系统被引量：15

参考文献20

二级参考文献24

共引文献99

同被引文献146

引证文献15

二级引证文献123

相关作者

相关机构

相关主题

浏览历史

一种基于LexRank算法的改进的自动文摘系统 被引量：15

参考文献20

二级参考文献24

共引文献99

同被引文献146

引证文献15

二级引证文献123

相关作者

相关机构

相关主题

浏览历史

一种基于LexRank算法的改进的自动文摘系统被引量：15