摘要
搜索引擎是Internet信息服务的主体,搜索引擎的设计是各网站建设的重要部分。介绍了搜索引擎的分类和各类搜索引擎的工作过程。在此基础上,指出了蜘蛛程序是由网页下载和网页内容分析及信息提取两部分组成,并结合用C++Builder作为开发工具给出了这两部分的源代码示例。最后介绍了蜘蛛程序设计要注意的问题。
Search engine is the main part of Internet information services,and design of search engine is important part of construction of Websites. At first, types and the working process of search engines are presented. It is presented that a spider (also called a "crawler" or a "hot") have two parts :one part goes to every page or representative pages on every Website that wants to be searchable and read it, using hypertext links on each page to discover and read a site's other pages. Another part finds keys of every page or representative pages on every Website. An example of spider programs is presented in C+ + Builder. Some problems of designing search engines are discussed.
出处
《计算机技术与发展》
2007年第2期5-7,共3页
Computer Technology and Development
基金
安徽省教育厅自然科学基金项目(2006KJ018A)