期刊文献+

自动文摘系统中的段落自适应聚类研究 被引量:6

Study on Adaptive Clustering of Paragraphs in Automatic Summarization System
下载PDF
导出
摘要 提出一种使用段落自动聚类思想的自动文摘方法,首先利用词频统计和词的位置特征得到文档的关键词向量、每个段落的关键词向量,并建立以段落为基础的向量空间模型;然后计算各段落间的相似度,采用K-medoids聚类算法实现文档语义段的划分,并通过一个自定义的目标函数来自适应的确定聚类数目K;最后根据在初始文档中的位置顺序从各语义段中选出与主题最相关的句子构成文摘。 Presents a useful automatic summarization method that uses automatic clustering thought. Firstly, the keyword vectors of a document and that of each paragraph of the document are got according to word frequency statistic and position feature. Based on paragraph, the vector space model for the whole article is established. Secondly, the similarity degree between paragraphs is calculated. The paragraphs of the document are classified into semantic paragraph by K-medoids clustering methods. K, the number of clusters, is determined by a self-defined objective function. Finally, according to their positions in the original document, the representative sentences are selected from each semantic paragraph to form the final summarization.
出处 《微计算机信息》 北大核心 2006年第06X期288-291,共4页 Control & Automation
基金 "十五"国防预研项目资助
关键词 自动文摘 语义段划分 向量空间模型 聚类 K-medoids Automatic summarization,Semantic Paragraph Partition,Vector Space Model,Clustering,K-medoids
  • 相关文献

参考文献9

  • 1刘挺,吴岩,王开铸,绍艳秋.语义段划分问题研究.语言工程[M],清华大学出版社,1997
  • 2Fang Chen,Kesong Han,Guilin Chen.An Approach to Sentence-Selection-Based Text Summarization [C].Proceeding of IEEE TENCON' 02,2002,pp489-493
  • 3G Salton, A Wong, C Yang.A vector space model for automatic indexing[C]. Communications of the ACM, 1975, 18(11):613- 620
  • 4张增林,施霞萍.基于遗传算法的聚类分析在体型分析中的应用[J].微计算机信息,2005,21(11Z):173-174. 被引量:7
  • 5Gong Yihong,Liu Xin.Generic text summarization using relevance measure and latent semantic analysis [C].Proceedings of ACM SIGIR' 01,pages 19-25,ACM,New York.
  • 6谷波,张永奎.文本聚类算法的分析与比较[J].电脑开发与应用,2003,16(11):4-6. 被引量:11
  • 7L.Kaufmann and P.J.Rousseeuw,Clustering by means of medoids [J].Statistical Data Analysis based on the L1 Norm,Y.Dodge,Ed,Amsterdam,1987,pp.405-416.
  • 8Patrick Pantel and Dekang Lin.2002.Document clustering with committees[C]. Proceedings of ACM SIGIR' 02,199-206.ACM,New York.
  • 9李鹏,赵峥嵘,杨洋,王昊宇,兰巨龙.一种基于链路流量的路由算法模型[J].微计算机信息,2005,21(09X):72-73. 被引量:2

二级参考文献9

共引文献17

同被引文献52

引证文献6

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部