期刊文献+

结合LDA和谱聚类的多文档摘要 被引量:8

Multi-document summary using LDA and spectral clustering
下载PDF
导出
摘要 自动文摘技术的目标是致力于将冗长的文档内容压缩成较为简短的几段话,将信息全面、简洁地呈现给用户,提高用户获取信息的效率和准确率。所提出的方法在LDA(Latent Dirichlet Allocation)的基础上,使用Gibbs抽样估计主题在单词上的概率分布和句子在主题上的概率分布,结合LDA参数和谱聚类算法提取多文档摘要。该方法使用线性公式来整合句子权重,提取出字数为400字的多文档摘要。使用ROUGE自动摘要评测工具包对DUC2002数据集评测摘要质量,结果表明,该方法能有效地提高摘要的质量。 Automatic summarization aims to compress lengthy document into a few short paragraphs, offers comprehensive and concise information to the users and improves the efficiency and accuracy of the information. A summarization method based on Latent Dirichlet Allocation(LDA) is proposed, using Gibbs sampling to estimate the word probability on topics and topic proba- bility on sentences, combing with the LDA parameters and spectral clustering algorithm to extract multi-document summariza- tion. The proposed approach uses a linear formula to integrate the sentence weights, extracting 400-words multi-document sum- marization. The experimental results show that the proposed method can improve the quality of summary effectively with the au- tomatic summarization evaluation toolkit ROUGE on DUC2002.
作者 付玲 张晖
出处 《计算机工程与应用》 CSCD 2013年第16期142-145,154,共5页 Computer Engineering and Applications
基金 国家高技术研究发展计划项目(863)(No.2007AA01Z151)
关键词 LATENT DIRICHLET Allocation (LDA) GIBBS抽样 谱聚类 多文档摘要 Latent Dirichlet Allocation(LDA) Gibbs sampling spectral clustering multi-document summary
  • 相关文献

参考文献18

  • 1Baxendale P.Machine-made index for technical literature-an Experiment[J].IBM Journal of Research Development, 1958, 2(4).
  • 2Edmundson H P.New methods in automatic extracting[J]. Journal of the ACM, 1969,16(2).
  • 3Luhn H P.The automatic creation of literature abstracts[J]. IBM Journal of Research Development,1958,2(2).
  • 4Miller G A.Wordnet: a lexical database for english.Commun[J]. Communications of the ACM, 1995,38( 11 ).
  • 5Aone C,Okurowaki M E,Gorlinsky J,et al.A trainable sum- marizer with knowledge acquired from robust NLP tech- niques[M]//Mani I, Maybury M.Advances in Automated Text Summarization.[S.1.] :MIT Press, 1999.
  • 6Conroy J M, O'leary D EText summarization via hidden mar- kov model[C]//ACM SIGIR, New Orleans, Louisiana, USA, 2001.
  • 7Barzilay R, McKeown K, Elhadad M.Information fusion in the context of multi-document summarization[C]//the 37th Conference on Association for Computational Linguistics (ACL 99) .College Park, Maryland, MD, USA, 1999.
  • 8Blei D,Ng A,Jordan M.Latent dirichlet allocation[J].Journal of Machine Learning Research, 2003,3 (4/5).
  • 9Arora R, Rav Ndran B.Latent Dirichlet allocation based multi-document snmmarization[C]//Proc of the Second Workshop on Analytic for Noisy Unstructured Text data Singapore, 2008 : 91-97.
  • 10Arora R, Rav Ndran B.Latent Dirichlet allocation and sin- gular value decomposition based multi-document summari- zation[C]//Proc of Eighth IEEE International Conference on Data Mining Pisa,Italv,2008:713-718.

二级参考文献5

  • 1Zhu Junyan, Wang Can, He Xiaofei, etal. Tag-oriented Document Summarization[C]//Proc. of the 18th International Conference on World Wide Web. Madrid, Spain: [s. n.], 2009.
  • 2Jing Hongyan, McKeown K R. Cut and Paste Based Text Summarization[C]//Proc. of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics. Seattle, USA: [s. n.], 2000: 178-185.
  • 3Knight K, Marcu D. Summarization Beyond Sentence Extraction: A Probabilistic Approach to Sentence Compression[J]. Artificial Intelligence, 2002, 139(1): 91-107.
  • 4Gong Yihong. Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis[C]//Proc. of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New Orleans, Louisiana, USA: [s. n.], 2001: 19-25.
  • 5龙华,何中市,伍星,李双庆.基于依存内容单元的金字塔自动摘要评估[J].计算机工程,2009,35(13):8-10. 被引量:2

共引文献3

同被引文献79

引证文献8

二级引证文献97

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部