摘要
应用图模型来研究多文档自动摘要是当前研究的一个热点,它以句子为顶点,以句子之间相似度为边的权重构造无向图结构。由于此模型没有充分考虑句子中的词项权重信息以及句子所属的文档信息,针对这个问题,该文提出了一种基于词项—句子—文档的三层图模型,该模型可充分利用句子中的词项权重信息以及句子所属的文档信息来计算句子相似度。在DUC2003和DUC2004数据集上的实验结果表明,基于词项—句子—文档三层图模型的方法优于LexRank模型和文档敏感图模型。
Graph model has been widely applied to document summarization by using sentence as the graph nodes, and the similarity between sentences as the weights of edge. However, the knowledge of terms and documents are neglected in this model. In this paper, we propose a tri-layer graph model based on the term, the sentence and the documentto make full use of knowledge when computing the similarity of sentences. The experimental results on the data sets of DUC'2003 and DUC'2004 show that the proposed model outperforms the state-of-the-art LexRank model and Document Sensitive Ranking model.
出处
《中文信息学报》
CSCD
北大核心
2014年第6期201-207,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金(61272212
61163006
61203313)
关键词
图模型
多文档自动摘要
句子相似度
词项—句子—
文档图
graph model
multi-document summarization
the similarity of sentences
term-sentence-document graph