摘要
超链接导向搜索(HITS)算法是比较经典的基于超链接的算法,但它忽视了链接页面的文本信息内容,没有区分链接的重要性,从而导致算法不可避免地发生主题漂移现象。为了解决这一问题,在原HITS算法的基础上,引入了经典的tf-idf算法,通过计算链接页面与查询主题的相关度来区分链接的重要性,以解决主题漂移的问题。改进算法使搜索引擎的排序结果更符合查询条件,相应的查确率也有很大提高。
Hyperlink-Induced Topic Search (HITS) algorithm is a classic hyperlink-based algorithm. But the HITS algorithm is purely based on the hyperlink, and it ignores the text of the linked page and does not distinguish the importance between the different hyperlinks. Because of this, a theme-drift phenomenon often happens when using HITS algorithm. The improved algorithm based on the HITS algorithm makes use of the classic tf-idf algorithm to calculate the related weight between the linked page and the query. The improved algorithm can make the search engine ranking results more in line with the query, and the corresponding precision rate has also been greatly improved.
出处
《计算机应用》
CSCD
北大核心
2009年第11期3100-3102,3106,共4页
journal of Computer Applications