摘要
随着Internet的快速发展,越来越多的用户提出与主题或者领域相关的查询需求,而传统通用搜索引擎已经无法满足这一需求。为了克服传统通用搜索引擎的不足,研究者提出面向主题的爬虫。首先给出主题网络爬虫的定义,接着提出主题爬虫的三个关键技术:抓取目标、网页搜索策略和网页主题相关性算法,最后给出主题爬虫在今后的一些研究方向。
With the high development of the Internet, the survey of topic-focused crawling starts to meet the new demands of people. And below is a basic introduction on concepts of topic-focused crawling. Lists some key technologies in topic-focused crawling, such as the searching strategy and the webpage analyzing algorithm. And finally indicates some future works for topic-focused crawling research.
出处
《现代计算机》
2014年第2期19-22,共4页
Modern Computer
关键词
搜索引擎
主题爬虫
网页分析
搜索策略
Search Engine
Topic-Focused Crawler
Webpage Analysis
Searching Strategy