期刊文献+

遗传算法在主题Web信息采集中的应用研究 被引量:5

Research of a Focused Crawler Using Genetic Algorithm
下载PDF
导出
摘要 传统的基于本地搜索算法的信息采集系统存在诸如主题漂移和采集结果局部最优等问题。在深入研究Web拓扑结构基础上,利用网络蜘蛛的在线状态,提出了基于全局信息的、动态综合了链接的立即回报价值和未来回报价值的遗传算法。通过此算法,利用元搜索技术可进一步提高网络蜘蛛的性能,具有更高的查全率和查准率,能够较好地解决现存问题。 Traditional focused crawler uses local search algorithms. It causes the problems of ‘topic drift' and ‘partially most superior'. Based on the knowledge of Web structure and web crawler's online status and meta-search technology, we proposed a new global search algorithm-genetic algorithm, which synthesizes the linkage' s immediate value and future value dynamically. Our experiments show that the new algorithm has better recall rate and precision.
作者 唐志 王成良
出处 《计算机科学》 CSCD 北大核心 2006年第7期71-74,共4页 Computer Science
关键词 网络蜘蛛 遗传算法 WEB社区 信息采集 Genetic algorithm, Web spider, Web community, Information retrieve
  • 相关文献

参考文献20

  • 1Menezer F,Pant G, Ruiz M, et al. Evaluating Topic-Driven Web Crawlers [A]. In:Proceedings of 24th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval [C], 2001. 241-249
  • 2Ester M, Grob M, Kriegel H. Focused Web crawling: a generic framework for specifying the user interest and for adaptive crawling strategies[A]. In: Proceedings of 26th International Conference on Very Large Database(VLDB'01)[C], 2001. 527-534
  • 3Eichmann D. Ethical Web Agents. In.. Proceedings of the 2nd International World Wide Web Conference, Chicago, Illinois, USA,1994
  • 4Cho J. Crawling the Web.. Discovery and maintenance of largescale Web data [D]. Department of Computer Science, Stanford University, 2001
  • 5Hersoviei M, Heydon A, Mitzenmaeher M, et al. The sharksearch algorithm -An application: Tailored Web site mapping[A]. In:Proceedings of 7th International World Wide Web Conference [C], 1998. 317-326
  • 6Borodin A,Roberts G O,Rosenthal J S,et al. Finding Authorities and Hubs From Link Struetures on the World Wide Web [A]. In:Proceedings of 10th International world Wide Web Conference,ACM, 2001. 415-419
  • 7Cho J,Gareia-Molina H,Page L. Efficient crawling through URL ordering [J]. Computer Networks, 198,30(1-7) : 161-172
  • 8Rennie J, McCallum A. Using reinforcement learning to spiderthe Web efficiently [A]. In: Proceedings of the International Conference on Machine Learning(ICML 99)[C], 1999. 335-343
  • 9McCallum A, Nigam K, Rennie J, et al. Building Domain-Specific Search Engines with Machine Learning Techniques [A]. AAAI-99 Spring Symposium on Intelligent Agents in Cyberspace [C],1999
  • 10Gibson D, Kleinberg J, Raghavan P. Inferring Web Communities from Link Topology. In: Proc. of the 9th ACM Conference on Hypertext and Hypermedia, Pittsburgh, Pennsylvania, USA, 1998

同被引文献59

引证文献5

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部