期刊文献+

超链接导向搜索算法中主题漂移的研究 被引量:5

Study on theme-drift of hyperlink-induced topic search algorithm
下载PDF
导出
摘要 超链接导向搜索(HITS)算法是比较经典的基于超链接的算法,但它忽视了链接页面的文本信息内容,没有区分链接的重要性,从而导致算法不可避免地发生主题漂移现象。为了解决这一问题,在原HITS算法的基础上,引入了经典的tf-idf算法,通过计算链接页面与查询主题的相关度来区分链接的重要性,以解决主题漂移的问题。改进算法使搜索引擎的排序结果更符合查询条件,相应的查确率也有很大提高。 Hyperlink-Induced Topic Search (HITS) algorithm is a classic hyperlink-based algorithm. But the HITS algorithm is purely based on the hyperlink, and it ignores the text of the linked page and does not distinguish the importance between the different hyperlinks. Because of this, a theme-drift phenomenon often happens when using HITS algorithm. The improved algorithm based on the HITS algorithm makes use of the classic tf-idf algorithm to calculate the related weight between the linked page and the query. The improved algorithm can make the search engine ranking results more in line with the query, and the corresponding precision rate has also been greatly improved.
作者 高琪 张永平
出处 《计算机应用》 CSCD 北大核心 2009年第11期3100-3102,3106,共4页 journal of Computer Applications
关键词 主题漂移 页面排序 搜索引擎 theme-drift sort page search engine
  • 相关文献

参考文献5

  • 1KLEINBERG J M. Authoritative sources in a hyperlinked environment [ C]// Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics, 1998:668 -677.
  • 2李玥,刘发升.基于链接分析的HITS算法研究[J].软件导刊,2008,7(11):70-72. 被引量:5
  • 3LEMPEL R, MORAN S. The Stochastic Approach for Link-Structure Analysis (SALSA) and the TKC effect [ J]. Computer Networks, 2000, 33(1/6) : 387 -401.
  • 4罗欣,夏德麟,晏蒲柳.基于词频差异的特征选取及改进的TF-IDF公式[J].计算机应用,2005,25(9):2031-2033. 被引量:55
  • 5HATCHER E, GOSPODNEIC O. Lucene in Action[M].北京:电子工业出版社,2007.

二级参考文献6

共引文献58

同被引文献39

引证文献5

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部