摘要
自动文摘是语言信息处理中的重要环节。该文提出一种基于局部主题关键句抽取的中文自动文摘方法。通过层次分割的方法对文档进行主题分割,从各个局部主题单元中抽取一定数量的句子作为文章的文摘句。通过事先对文档进行语义分析,有效地避免了数据冗余和容易忽略分布较小的主题等问题。实验结果表明了该方法的有效性。
Automatic summarization is an important issue in natural language processing. This paper proposes a new method for automatic summarization of Chinese text based on extracting sentences from subtopics. The document is segmented into several units in terms of the subtopics in the document. The most representative sentences in each subtopic unit are selected as the summary sentences. By analyzing semantic structure of the documents in advance, the summary sentences are not redundancy and the coverage of each subtopic is balanced. Experimental results show that the method is effective.
出处
《计算机工程》
CAS
CSCD
北大核心
2008年第22期49-51,共3页
Computer Engineering
基金
国家自然科学基金资助项目(60773167
60673040)
关键词
自动文摘
主题分割
局部主题单元
automatic summarization
topic segmentation
local topic unit