期刊文献+

基于句群的自动文摘方法 被引量:2

Automatic abstract method based on Chinese sentence grouping
下载PDF
导出
摘要 针对目前多数基于句子或段落作为处理单元的自动文摘方法,提出一种基于句群的自动文摘方法。该方法引用了一种基于多元判别分析(MDA)的汉语句群自动划分理论,通过获得句间语义更好的句群作为自动文摘的处理粒度,在此基础上使用潜在狄利克雷分配(LDA)主题模型将文本表示成向量矩阵,再使用k-means算法对向量进行聚类,然后按照一定比例从聚类后的类别中抽取生成文摘,最后采用Kappa检验和肯德尔相关系数评价摘要的质量。实验结果表明该方法得到的整体Kappa值达到了0.7、肯德尔相关系大于0.8,两个评价指标结果都高于各自较好等级的评价值,因此以句群作为处理粒度的自动文摘方法较传统的以句子作为处理粒度的方法能生成质量更好的文摘。 At present,sentence or paragraph is considered as a processing unit in most automatic abstracting models.In this paper,an automatic abstracting method was proposed based on sentence grouping.This method adopted an automatic Chinese sentence grouping theory based on MDA( Multiple discriminant Analysis).The obtained sentences groups contained better semantic information which was more suitable as a processing unit in automatic abstracting.At the same time,one text was represented as a vector matrix by using the LDA( Latent dirichlet Allocation) topic model and clustering operation was processed using k-means algorithm.Then the candidate abstract was generated from clustered results according to some proportions.Finally the obtained abstract was evaluated by Kappa statistics and Kendall related coefficient.The experimental results show that the overall Kappa value reaches 0.7 and the Kendall related coefficient is more than 0.8,which are all higher than those of the respective good grades.So the automatic abstracting based on sentence grouping can generate better results compared with the traditional methods which consider sentence as processing granularity.
出处 《计算机应用》 CSCD 北大核心 2016年第A01期58-62,71,共6页 journal of Computer Applications
基金 国家自然科学基金资助项目(61202281 61103101) 教育部人文社会科学研究项目青年基金资助项目(10YJCZH052 12YJCZH201)
关键词 自动文摘 句群 主题模型 聚类 automatic abstracting sentence grouping topic model clustering
  • 相关文献

参考文献22

  • 1GONG Y, LIU X. Generic text summarization using relevance measure and latent semantic analysis[ C]// Proceedings of the 24th Annual In- ternational ACM SIGIR Conference on Research and Development in In- formation Retrieval. New York: ACM Press, 2001:19-25.
  • 2杨晓兰,钟义信.基于文本理解的自动文摘系统研究与实现[J].电子学报,1998,26(7):155-158. 被引量:17
  • 3王荣波,李杰,黄孝喜,周昌乐,谌志群,王小华.基于多元判别分析的汉语句群自动划分方法[J].计算机应用,2015,35(5):1314-1319. 被引量:4
  • 4MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality [ C]// Neural Information Processing Systems 26. Cambridge: MIT Press, 2013:3111 -3119.
  • 5MIKOLOV T, SUTSKEVER I, CHEN K, et al. Efficient estimation of word representations in vector space[ J]. Eprint Arxiv, 2013, 26: 3111 -3119.
  • 6LUHN H P. The automatic creation of literature abstracts[ J]. IBM Journal of Research Development, 1958, 2(2) : 159 -165.
  • 7MATHIS B A, RUSH J E. Abstracting encyclopedia of computer and technology[M]. New York: Marcel Dekker Inc., 1975,1: 102-142.
  • 8刘德荣 ,王永成 ,刘传汉 .基于主题概念的多文档自动摘要研究[J].情报学报,2005,24(1):69-74. 被引量:7
  • 9RAU L F, JACOBS P S, ZERNIK U. Information extraction and text summarization using linguistic knowledge acquisition [ J]. Information Processing & Management, 1989, 25(4):419 -428.
  • 10WU L, WEI X. Fudan abstract system of Chinese text[J]. Communications of COLIPS, 1996, 6(1) : 35 -39.

二级参考文献99

共引文献1146

同被引文献16

引证文献2

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部