摘要
提出一种使用段落自动聚类思想的自动文摘方法,首先利用词频统计和词的位置特征得到文档的关键词向量、每个段落的关键词向量,并建立以段落为基础的向量空间模型;然后计算各段落间的相似度,采用K-medoids聚类算法实现文档语义段的划分,并通过一个自定义的目标函数来自适应的确定聚类数目K;最后根据在初始文档中的位置顺序从各语义段中选出与主题最相关的句子构成文摘。
Presents a useful automatic summarization method that uses automatic clustering thought. Firstly, the keyword vectors of a document and that of each paragraph of the document are got according to word frequency statistic and position feature. Based on paragraph, the vector space model for the whole article is established. Secondly, the similarity degree between paragraphs is calculated. The paragraphs of the document are classified into semantic paragraph by K-medoids clustering methods. K, the number of clusters, is determined by a self-defined objective function. Finally, according to their positions in the original document, the representative sentences are selected from each semantic paragraph to form the final summarization.
出处
《微计算机信息》
北大核心
2006年第06X期288-291,共4页
Control & Automation
基金
"十五"国防预研项目资助