期刊文献+

多特征融合的新闻聚类相似度计算方法 被引量:2

A Similarity Calculation for News Clustering with Mixed
下载PDF
导出
摘要 随着网络的发展,互联网已经成为了最重要的新闻媒介。网络上的新闻报道能广泛传播,对社会有着深刻的影响。因此互联网新闻事件的监督和挖掘分析,对政府,企业有着巨大的价值。在进行新闻报道分析的时候,最为重要的任务之一就是把网络上类别杂乱,来源广泛的新闻进行识别和归类。新闻归类主要是基于通用的聚类的方法,其中一项基本的技术就是新闻报道相似度计算。根据需求不同,新闻聚类类别可以是一个事件,或者是一领域。本文针对事件的新闻报道聚类,提出了一种混合特征的相似度计算方法。采用了Tf-Idf和n-gram结合的向量空间模型来得到文本相似度,再通过规则识别出新闻文本中的时间,地点等关键信息,进行关键信息匹配度计算,最后再把两个相似度结合作为最终匹配度。实验表明,混合特征的方法明显提高了事件聚类的准召率。 With the development of network technology,Internet have become the most important news media. The news in the Internet could be widespread and have profound influence on the society. Thus, the analysis and supervision of online news is valuable to government and company. One of the most important tasks in the analysis of online news and reports is identifying and classifying those news and reports. News and reports classifying base on general classification technologies, and a basic technology of them is the computation of news similarity. The "class" in news classification could be an event or a field, according to different requirements. In the thesis, a algorithm of computing news and report similarity for events clustering with mixed feature is designed. This method apply both Tf-Idf and n-gram in vector space model (VSM). Furthermore, it abstracts some key information of news, such as time and place, calculating key information similarity using those information. In the end,combe those two similarity as final similarity. The experiment show that this method improve the accuracy and recall rate though mixing features.
作者 李俊峰
出处 《软件》 2017年第12期170-174,189,共6页 Software
关键词 计算机应用技术 话题发现 聚类 文本相似度 Computer application technology Topic detection Clustering Text similarity
  • 相关文献

参考文献5

二级参考文献38

共引文献1656

同被引文献14

引证文献2

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部