摘要
文章从搜索引擎的应用出发,探讨了网络蜘蛛在搜索引擎中的作用和地位,提出了网络蜘蛛的功能和设计要求。在对网络蜘蛛系统结构和工作原理所作分析的基础上,研究了线程调度、页面爬取、解析等策略和算法,并使用Java实现了一个网络蜘蛛的程序,对其运行结果做了分析。
The paper,discussing from the application of the search engine,searches the importance and function of Web spider in the search engine,and puts forward its demand of function and design.On the base of analyzing Web Spider's system structure and working elements,this paper also researches the method and strategy of multithreading scheduler,Web page crawling and HTML parsing. And then,a program of Web page crawling based on Java is applied and analyzed.
出处
《电脑与信息技术》
2007年第4期36-39,45,共5页
Computer and Information Technology