期刊文献+

基于动态主题建模的Web论坛文档摘要 被引量:8

Web Forum Thread Summarization Based on Dynamic Topic Modeling
下载PDF
导出
摘要 针对论坛文档由于自身特点缺乏有效的文档摘要方法的现状,提出一种基于LDA主题模型的论坛文档摘要方法.在主题建模中考虑了Web论坛文档中帖子和帖子之间的回复关系,并把主题的分布变为随文档变化而变化的一个动态过程,来解决主题的依赖和偏移问题.在使用GibbsEM采样算法来确定动态主题模型的参数后,通过计算句子中主题权重之和来确定各个主题的重要程度;最后根据动态主题模型中主题的概率分布计算各句子的权重并得到文档的摘要.实验结果表明,新方法在各个ROUGE评测标准上均优于其他各种对比的摘要方法. Because there is no effective document summarization method for Web forum threads currently, this paper proposes a Web forums thread summarization method based on a latent Dirichlet allocation (LDA) topic model. To handle the topic dependencies and the drifting problem, we consider the reply-relations among posts in topic modeling, and set the distribution of each topic as a dynamic process with the change of the thread discussion. We utilize the Gibbs EM algorithm to get parameters of the proposed dynamic topic model and determine the importance of each topic according to the sum of the topic weight over all sentences. Finally we calculate the scores of sentences based on probability distribution of topics and then generate the summarization of each thread. The experimental results on the two different forum data sets show that the new method outperforms several widely used summarization methods in terms of ROUGE metrics.
出处 《计算机研究与发展》 EI CSCD 北大核心 2012年第11期2359-2367,共9页 Journal of Computer Research and Development
基金 国家自然科学基金项目(60970047 61103151 61173068) 山东省自然科学基金项目(ZR2012FM037)
关键词 WEB论坛 论坛文档摘要 主题建模 Gibbs EM采样 文档摘要 Web forum thread summarization topic modeling Gibbs EM sampling documentsummarization
  • 相关文献

参考文献12

  • 1Lin C, et al. Simultaneously modeling semantics and structure of threaded discussions: A sparse coding approach and its applications [C]//Proc of the 32nd SIGIR. New York: ACM, 2009.
  • 2Blei D, Lafferty J. Dynamic topic models [C] //Proc of the 23rd ICML. New York: ACM, 2006.
  • 3E1-Arini K, et al. Turning down the noise in the blogosphere [C] //Proc of the 15th SIGKDD. New York: ACM, 2009.
  • 4Iwata T, et al. Topic tracking model for analyzing consumer purchase behavior [C] //Proc of the 21st IJCAI. San Francisco: Morgan Kaufmann, 2009.
  • 5Lin C Y, Hovy E. Automatic evaluation of summaries using n gram co-occurrence statistics [C] //Proc of the HLT- NAACL. Cambridge: MIT Press, 2003.
  • 6苗家,马军,陈竹敏.一种基于HITS算法的Blog文摘方法[J].中文信息学报,2011,25(1):104-109. 被引量:7
  • 7Blei D, Ng A, Jordan M. Latent Dirichlet allocation [J]. The Journal of Machine Learning Research, 2003, 3 (1): 993-1022.
  • 8Wanach H. Topic modeling: Beyond bag-of-words [C] // Procof the 23rd ICML. New York: ACM, 2006.
  • 9[2011-07-18]. http://discassions, info. apple, com.
  • 10[2011-07 17]. http://slashdot, org.

二级参考文献14

  • 1秦兵,刘挺,李生.多文档自动文摘综述[J].中文信息学报,2005,19(6):13-20. 被引量:51
  • 2Li J, Sun L, Kit C,et al. A query-focused multi-document summarizer based on lexical chains[C]//Proc, of Document Understanding Conference. 2007.
  • 3Wan X. Document-based HITS model for multi-document summarization [J]. Lecture Notes in Computer Science, 2008, 5351: 454-465.
  • 4Radev D, Jing H, Sty? M, et al. Centroid-based summarization of multiple documents[J]. Information Processing and Management, 2004, 40(6):919-938.
  • 5Hu M, Sun A, Lim E. Comments-oriented document summarization: understanding documents with readers' feedback [C]//Proc of SIGIR'08, NY USA: ACM, 2008: 291-298.
  • 6Brunn M. , Y. Chali, C.J. Pinchak. Text summarization using lexical chains[C]//the Proceedings of the Document Understanding Conference (DUC-2001) 2001 : 135-140.
  • 7Wan X, Yang J. Multi-document summarization using cluster based link analysis [C]//Proc of SIGIR' 08, NY USA: ACM, 2008: 299-306.
  • 8Wang D, Li T, Zhu S, Ding C. Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization[C]//Proc, of SIGIR' 08, NY USA: ACM, 2008. 307-314.
  • 9Zhou L, Hovy E. On the summarization of dynamically introduced information: Online discussions and blogs[C]//Proc, of AAAI'06 Spring Symposium on Computational Approaches to Analyzing Weblogs, Stanford, California: AAAI, 2006: 237-242.
  • 10Mishne G, Glance N. Leave a Reply: An analysis of weblog comments[C]//3rd Annual workshop on the Weblogging Ecosystem. Edinburgh, UK, 2006.

共引文献6

同被引文献93

引证文献8

二级引证文献92

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部