期刊文献+

基于词共现关系和粗糙集的微博话题检测方法 被引量:1

News Topic Detection on Chinese Microblog Based on Rough Set and Word Co-Occurrence
下载PDF
导出
摘要 为解决传统词共现方法在微博中检测话题时计算复杂度大、查全率不高、查准率低的情况,提出一种基于粗糙集原理的改进词共现算法(RSCW).通过词共现关系形成词共现矩阵,并由共现矩阵找出极大完全子图作为话题簇中心,最后由粗糙集原理找出每个话题的关键词集合.在NLPIR微博内容语料库和实时获取的微博数据集上的实验结果表明,该方法能够有效地从大规模微博信息中检测突发新闻,提高突发新闻的识别率. Traditional word co-occurrence detection methods in microblog news encounter the problems of high computational complexity, high time consuming, low recall rate and low precision. An improved algorithm of word co-occurrence detection based on rough set is proposed in this paper aiming at solving these problems. It builds a word co-occurrence matrix through word co-occurrence relation, and finds out the maximum complete subgraph as topic cluster center via co-occurrence matrix, finally identifies each topic keyword set using the rough set theory. The experimental results carried out on the microblog content corpus of NLPIR and the real-time collection of microblog data set verify that this method can effectively detect news topic from the massive microblog information and realize the news topic tracking.
作者 兰天 郭躬德
出处 《计算机系统应用》 2016年第6期17-24,共8页 Computer Systems & Applications
基金 国家自然科学基金(61070062 61175123) 福建高校产学合作科技重大项目(2010H6007)
关键词 微博 词共现图 粗糙集 话题检测 microblog word co-occurrence graph rough set topic detection
  • 相关文献

参考文献21

二级参考文献145

共引文献427

同被引文献20

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部