摘要
传统的页面排序算法偏重于旧网页,使得一些旧的页面经常出现在检索结果的前面。为了改进此类算法,引入时间链接分析,使用爬虫抓起页面时HTTP协议反馈回来的修改时间作为页面和链接的时间,并综合考虑页面的出入链接个数和时间来计算页面的权重值。开发出的WTPR算法能使新网页集在排序中上升,高质量的旧网页比普通的旧网页能获得较高的排序值。
The traditional ranking algorithm favors the old pages, which makes old pages always appear in the top of the ranking results when pages are ranked according to the dynamic Web by the static ranking algorithm. In order to improve these algorithms, this paper introduced the temporal link-analyze. The algorithm used the last modification time returned by the HTTP response as the timestamp of nodes and links concerned. And integrated the weight of the in-link and out-link also in order to compute the overall weight of the pages. The WTPR algorithm developed can make the old pages decline and new pages rose in the ranking result, while the old pages of high-quality get higher rank value than common old pages.
出处
《计算机应用研究》
CSCD
北大核心
2009年第7期2438-2441,2477,共5页
Application Research of Computers
基金
国家自然科学基金资助项目(60773049)
关键词
页面排序算法
网页
网络挖掘
pagerank algorithm
Web pages
Web data mining