期刊文献+

基于改进TF* PDF算法的网络新闻热点话题检测和跟踪 被引量:10

NETWORK NEWS HOT TOPICS DETECTION AND TRACKING BASED ON MODIFIED TF* PDF ALGORITHM
下载PDF
导出
摘要 对网络新闻文本进行研究,发现网络新闻结构包含标题和正文,基于这种结构提出加权词频统计方法,该方法提高了可能成为热点话题的特征项的权重。通过Single-Pass聚类算法,对新闻报道进行聚类,得到话题列表。基于TF*PDF思想,引入话题权重,提出新的话题热度计算方法,同时使用"话题指数"描述话题的发展趋势。通过实验表明新的热度计算方法比原热度计算方法检测效果好,得到的话题发展趋势与实际吻合。 We study the text of network news, and find that the structure of news contains the title and the main text. Based on such structure we present a weighted word frequency statistical method. The method improves the weight of the feature item which may become the hot topic. Through Single-Pass clustering algorithm it clusters the news and reports and gets the topics list. Based on TF * PDF ideas, it introduces topic weight, and puts forward a new topic heat calculation method. At the same time it uses the "topic index" to describe the development trend of the topic. Through the experiments it is showed that the new heat calculation method is better than the original heat calculation method in detection effect. The topic development trend derived is in agreement with the actual.
作者 迟呈英 李红
出处 《计算机应用与软件》 CSCD 北大核心 2013年第12期311-314,共4页 Computer Applications and Software
关键词 Single—Pass聚类 话题识别 热点话题 热度分析 Single-Pass clustering Topic identification Hot topic: Heat analysis
  • 相关文献

参考文献7

二级参考文献48

  • 1王泽彬,金飞,李夏,王冠.Web数据挖掘技术及实现[J].哈尔滨工业大学学报,2005,37(10):1403-1405. 被引量:11
  • 2于满泉,骆卫华,许洪波,白硕.话题识别与跟踪中的层次化话题识别技术研究[J].计算机研究与发展,2006,43(3):489-495. 被引量:49
  • 3谢海光,陈中润.互联网内容及舆情深度分析模式[J].中国青年政治学院学报,2006,25(3):95-100. 被引量:114
  • 4雷震,吴玲达,雷蕾,黄炎焱.初始化类中心的增量K均值法及其在新闻事件探测中的应用[J].情报学报,2006,25(3):289-295. 被引量:25
  • 5赵华,赵铁军,张姝,王浩畅.基于内容分析的话题检测研究[J].哈尔滨工业大学学报,2006,38(10):1740-1743. 被引量:20
  • 6Allan J, Papka R. On-line new event detection and tracking[ A]. Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [ C ]. Melbourne: ACM Press, 1998. 37-45.
  • 7Seo Y W, Sycara K. Text clustering for topic detection [ Z ]. USA : Carnegie Mellon University, 2004.
  • 8Yang C, Shi X, Wei C. Tracing the event evolution of terror attacks from on-line news [ A ]. Proceedings of IEEE International Conference on Intelligence and Security Informatics [ C ]. San Diego: Lecture Notes in Computer Science, 2006. 343 - 354.
  • 9Trieschnigg D, Kraaij W. Scalable Hierarchical Topic Detection: Exploring a Sample Based Approach [ C ]. In : Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil. 2005:655 - 656.
  • 10Guha S, Mishra N, Motwani R, et al. Clustering Data Streams [ C ]. In : Proceedings of the Annual Symposium on Foundations of Computer Science. 2000 : 359 - 366.

共引文献160

同被引文献123

引证文献10

二级引证文献40

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部