期刊文献+

基于隐主题马尔科夫模型的多特征自动文摘 被引量:4

Summarization Based on Hidden Topic Markov Model with Multi-features
下载PDF
导出
摘要 基于隐主题马尔科夫模型,消除LDA主题模型的主题独立假设,使得文摘生成过程中充分利用文章的结构信息,并结合基于内容的多特征方法提高文摘质量。提出在不破坏文章结构的前提下,从单文档扩展到多文档的自动文摘策略,最终搭建完善的自动文摘系统。在DUC2007标准数据集上的实验证明了隐主题马尔科夫模型和文档特征的优越性,所实现的自动文摘系统ROUGE值有明显提高。 Based on hidden topic Markov model (HTMM), the authors eliminate assumption limitation in LDA (latent dirichlet allocation) to exploit the structure information during generating summary, and use multi-features based on document content to improve the summary quality. Furthermore, a method for developing single-document summarization to multi-document summarization without breaking document structure is proposed, to achieve the perfect automatic summarization system. Meanwhile, experiment results on the standard dataset DUC2007 show the advantage of HTMM and multi-feature. Compared with the performace of LDA, ROUGE values are improved based on HTMM with multi-features.
出处 《北京大学学报(自然科学版)》 EI CAS CSCD 北大核心 2014年第1期187-193,共7页 Acta Scientiarum Naturalium Universitatis Pekinensis
基金 国家自然科学基金(61370130) 科技部国际科技合作计划(K11F100010) 中央高校基本科研业务费专项资金(2010JBZ2007) 中国科学院计算技术研究所智能信息处理重点实验室开放课题(IIP2010-4) 北京交通大学人才基金(2011RC034)资助
关键词 隐主题马尔科夫模型 多特征 多文档自动文摘 hidden topic Markov model multi-features multi-document summarization
  • 相关文献

参考文献20

  • 1刘挺,王开铸.自动文摘的四种主要方法[J].情报学报,1999,18(1):10-19. 被引量:55
  • 2Arora R,Ravindran B. Latent dirichlet allocation based multi-document summarization[A].New York:ACM,2008.91-97.
  • 3Gong Y,Liu X. Generic text summarization using relevance measure and latent semantic analysis[A].New orleans:ACM,2001.19-25.
  • 4Bhandari H,Shimbo M,Ito T. Generic text summarization using probabilistic latent semantic indexing[A].Hyderabad,2008.133-140.
  • 5Shen D,Sun J T,Li H. Document summarization using conditional random fields[A].Hyderabad,2007.2862-2867.
  • 6王红玲,张明慧,周国栋.主题信息的中文多文档自动文摘系统[J].计算机工程与应用,2012,48(25):132-136. 被引量:5
  • 7Titov I,McDonald R. A joint model of text and aspect ratings for sentiment summarization[A].Columbus,2008.308-316.
  • 8Blei D M,Ng A Y,Jordan M I. Latent dirichlet allocation[J].{H}JOURNAL OF MACHINE LEARNING RESEARCH,2003.993-1022.
  • 9徐戈,王厚峰.自然语言处理中主题模型的发展[J].计算机学报,2011,34(8):1423-1436. 被引量:236
  • 10Boyd-Graber J,Blei D M. Syntactic topic models[A].Bangkok,2009.1-8.

二级参考文献146

共引文献361

同被引文献82

  • 1SHI Hui,WANG Tiexin.A Hybrid Method of Extractive Text Summarization Based on Deep Learning and Graph Ranking Algorithms[J].Transactions of Nanjing University of Aeronautics and Astronautics,2022,39(S01):158-165. 被引量:1
  • 2Luhn H P. The automatic creation of literature abstracts[J]. IBM Journal of Research and Development, 1958, 2(2): 159-165.
  • 3Mani I, Maybury M T. Advances in automatic text summarization[M]. Cambridge: MIT Press, 1999.
  • 4Mani I, Bloedorn E. Machine learning of generic and user-focused summarization[C]//Proceedings of the Fifteenth National Conference on Artificial Intelligence.Reston VA:AAAI Press, 1998: 821-826.
  • 5Mitchell T M. Machine learning[M]. Burr Ridge: McGraw Hill, 1997:45.
  • 6Jones K S. Automatic summarizing:Factors and directions[C]//Advances in Automatic Text Summarization. Cambridge: MIT Press,1999:1-12.
  • 7Hovy E, Marcu D. Automated text summarization[C]//The Oxford Handbook of Computational Linguistics. USA: Oxford University Press,2005:583-598.
  • 8Baxendale P B. Machine-made index for technical literature:An experiment[J]. IBM Journal of Research and Development, 1958, 2(4): 354-361.
  • 9Edmundson H P. New methods in automatic extracting[J]. Journal of the ACM (JACM), 1969, 16(2): 264-285.
  • 10Ramezania M, Feizi-Derakhshi M. Automated text summarization:An overview[J]. Applied Artificial Intelligence:An International Journal,2014, 28(2):178-215.

引证文献4

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部