摘要
爬虫是搜索引擎和网站的核心元素,专用的网络爬虫能够在短时间内从网络上抓取大量有用数据。为了爬取旅游网站的旅游数据,分析网站上的旅游热门地区和热门景点,研究了一种基于Scrapy框架的针对旅游网站的聚焦型网络爬虫,对爬取的数据进行分析,并通过第三方库Pandas和Matplotlib实现数据可视化。实验结果表明,提出的以旅游网站为主题的聚焦网络爬虫能够提高对旅游数据的检索效率,在旅游网站海量数据里快速找到所需信息,为旅游爱好者出行以及各地区、景点优化服务提供参考。
Crawler is the core element of search engines and websites. Dedicated web crawlers can grab a large amount of useful data from the web in a short time. In order to crawl the travel data of the travel website and analyze the popular places and attractions on the website,a focused web crawler based on Scrapy framework for tourism websites is studied. The crawled tourism data is analyzed and visualized through the third-party library Pandas and Matplotlib in Pycharm. The experimental results show that the focused web crawler proposed in this paper can improve the retrieval efficiency of tourism data,quickly find the required information from the massive data of tourism websites,and provide reference for tourism enthusiasts to travel and optimize services for various regions and scenic spots.
作者
赵蔷
ZHAO Qiang(School of Computer Science,Xianyang Normal University,Xianyang 712000,China)
出处
《电子设计工程》
2022年第16期152-155,共4页
Electronic Design Engineering
基金
陕西省教育科学“十三五”规划2017年课题(SGH17H197)。