期刊文献+

基于PageRank的新闻关键词提取算法 被引量:15

Keyword Extraction from News Articles Based on PageRank Algorithm
下载PDF
导出
摘要 现有的基于复杂网络的关键词提取算法在构建加权文本网络时没有考虑文本的自然语言特性,且在提取关键词时较少涉及复杂网络领域经典算法。本文引入词频分享权重,利用词频特性为节点之间的连边加权。在此基础上,基于Page Rank算法,并结合人类语言习惯特性定义位置权重系数,提出了一个新的新闻关键词提取算法——LTWPR算法,综合考虑了文本网络的局部特征和全局特征。采用新浪新闻语料进行了大量实验,结果表明该算法能够快速有效的覆盖新闻作者标注的关键词,且提取效果更佳。 Most of the existing methods of extracting keyword based on complex networks ignore the natural language characters when building the weighted text network. In the meantime, they involve less the classical algorithms in complex network field. Based on PageRank algorithm, we propose a keyword extraction method, named LTWPR (located and TF-weighted PageRank), which takes into consideration term-frequency character and human language characters. The algorithm creates a term-frequency-shared weight in order to share the node's term-frequency value to its links, and defines a position weight coefficient to express different importance of words in different positions of news articles. LTWPR brings text networks' local and global features into consideration, making the results more accurate. Comprehensive experiments are conducted based on news articles grabbed from Sina News. Experimental results show that LTWPR algorithm is more effective and can better cover the keywords tagged by authors.
出处 《电子科技大学学报》 EI CAS CSCD 北大核心 2017年第5期777-783,共7页 Journal of University of Electronic Science and Technology of China
基金 教育部人文社会科学研究规划基金(15YJZH016)
关键词 复杂网络 关键词提取 自然语言 PAGERANK 词频分享权重 complex networks keyword extraction natural language PageRank term-frequency- shared weight
  • 相关文献

参考文献4

二级参考文献44

共引文献306

同被引文献109

引证文献15

二级引证文献41

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部