期刊文献+

基于用户查询关键词的网页去重方法研究 被引量:6

The Study on the Duplicated Web Pages Detection Algorithm Based on the Keyword from User’s Submission
下载PDF
导出
摘要 在研究传统的基于特征码去重算法的基础上,针对元搜索引擎中网页重复现象,提出一种基于用户查询关键词的网页去重方法,提高元搜索引擎检索质量,并且介绍算法的实现过程,通过实验验证算法的有效性。 Based on the study of the duplicated Web pages detection algorithm with feature code, the paper proposes a duplicated detection algorithm based on the keyword from user' s submission for meta search engine. The main steps of algorithm are introduced. And this algorithm is tested and verified its validity in an experiment.
出处 《现代图书情报技术》 CSSCI 北大核心 2008年第7期43-46,共4页 New Technology of Library and Information Service
关键词 网页去重 元搜索 特征码 中文分词 Duplicate detection Meta search Feature code Chinese word segmentation
  • 相关文献

参考文献6

  • 1Cho J,Shivakumar N, Garcia - Molina H, Finding Replicated Web Collections [ C ]. In : Proceedings of the ACM International Conference on Management of the Data. USA : ACM Press, May 2000,29 (2) : 355 - 366.
  • 2唐培丽,胡明,解飞.元搜索引擎研究[J].气象水文海洋仪器,2005,22(3):62-66. 被引量:3
  • 3刘迁,贾惠波.中文信息处理中自动分词技术的研究与展望[J].计算机工程与应用,2006,42(3):175-177. 被引量:68
  • 4Ye S, Song R, Wen J- R, et al. A Querydependent Duplicate Detection Approach for Large Scale Search Engines [ C ]. In : Proceedings of the 6th Asia - Pacifw Web Conference, 2004:48 - 58.
  • 5Fetterly D, Manasse M, Najork M. On the Evolution of Clusters of Near - Duplicate Web Pages [ C ]. In : Proceedings of the 1 st Conference on Latin American Web Congress, 2003:37 -45.
  • 6Ye S,Wen J R,Ma W Y. A Systematic Study on Parameter Correlations in Large -scale Duplicate Document Detection [ J]. Knowledge and Information Systems, 2005,14(2 ) :217 - 232.

二级参考文献25

共引文献69

同被引文献56

引证文献6

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部