摘要
为了解决主题爬虫在全局搜索中难以实现最优解的问题,提高主题爬虫的准确率和召回率,文中设计了一个结合灰狼算法的主题爬虫搜索策略。实验结果表明,与传统的广度优先搜索策略以及同样是群体智能算法的遗传算法相比,基于灰狼算法的主题爬虫的性能有了很大的提高,能爬取到更多的主题相关的网页。
In order to solve the problem that the focused crawler is difficult to achieve an optimal solution in the global search,and improve the accuracy of the topic crawler and the recall rate,this paper designed a focused crawler search strategy combined with grey wolf algorithm.The experimental results show that compared with the traditional breadth-first search strategy and the genetic algorithm which is also a swarm intelligence algorithm,the performance of the focused crawler based on grey wolf algorithm was greatly improved,and more topic-related web pages can be crawled.
作者
萧婧婕
陈志云
XIAO Jing-jie;CHEN Zhi-yun(Department of Computer Science and Technology,East China Normal University,Shanghai 200062,China)
出处
《计算机科学》
CSCD
北大核心
2018年第B11期146-148,166,共4页
Computer Science
基金
基于MOOC的计算机课资源建设项目资助
关键词
主题爬虫
灰狼算法
主题相关度
网页重要性
Focused crawler
Grey wolf algorithm
Thematic relevance
Webpage importance