摘要
对网络新闻文本进行研究,发现网络新闻结构包含标题和正文,基于这种结构提出加权词频统计方法,该方法提高了可能成为热点话题的特征项的权重。通过Single-Pass聚类算法,对新闻报道进行聚类,得到话题列表。基于TF*PDF思想,引入话题权重,提出新的话题热度计算方法,同时使用"话题指数"描述话题的发展趋势。通过实验表明新的热度计算方法比原热度计算方法检测效果好,得到的话题发展趋势与实际吻合。
We study the text of network news, and find that the structure of news contains the title and the main text. Based on such structure we present a weighted word frequency statistical method. The method improves the weight of the feature item which may become the hot topic. Through Single-Pass clustering algorithm it clusters the news and reports and gets the topics list. Based on TF * PDF ideas, it introduces topic weight, and puts forward a new topic heat calculation method. At the same time it uses the "topic index" to describe the development trend of the topic. Through the experiments it is showed that the new heat calculation method is better than the original heat calculation method in detection effect. The topic development trend derived is in agreement with the actual.
出处
《计算机应用与软件》
CSCD
北大核心
2013年第12期311-314,共4页
Computer Applications and Software