摘要
针对常用搜索引擎返回给用户的信息中包含大量重复网页的缺陷,提出了一种基于信息-摘要算法的去除重复网页算法。由于算法的成熟,该算法易实现,可移植性强。实验证明该算法能有效地去除常用搜索引擎返回的重复网页,从而为Internet用户提高信息检索效率,具有较强的实用价值。
The returning information of the usual search engines often includes massive repeated pages. Aims at it, an inspecting algorithm of approximate mirror pages is proposed in this paper. Because of the mature of MD5, this algorithm can be implemented easily and is portable. The experiment shows that it can remove the repeated pages from usual search engines effectively and can improve the searching efficiency of Internet users. It has good application foreground.
出处
《计算机技术与发展》
2006年第6期222-223,226,共3页
Computer Technology and Development
基金
教育部科学与技术研究重点项目(教技司2001224号)
关键词
信息-摘要算法
近似镜像网页
信息检索
message-digest algorithm
approximate mirror pages information searching