摘要
网络数据形式的多样性与复杂程度都对数据获取造成了很大的影响,常用的网络爬虫已经无法适用于精准的数据查找、获取与分析,而Python语言简单并提供了多线程分布式爬虫框架,使网络爬虫的实现不再那么复杂.可配置网络爬虫通过Python 2.7和Mysql来实现一个多线程爬虫程序,可实现数据抓取并放入数据库功能,只要数量不多的代码就能实现所需要的网络爬虫,使用Python进行可配置爬虫设计成为快速有效的一种选择.
Because the diversity and complexity of network data form have a great impact on data acquisition,the commonly used web crawler does not fit for accurate data search,acquisition and analysis.Python language is simple and provides a multi-threaded distributed crawler framework,Which makes the implementation of web crawler no longer complex.Configurable web crawler implements a multi-threaded crawler program through Python 2.7 and Mysql,which realizes crawling data and putting it into database.With just a small amount of code,Python can realize the required web crawler,making it a fast and effective choice for configurable crawler design.
作者
苏国新
苏聿
SU Guo-xin;SU Yu(Xiamen Ocean Vocational College,Xiamen,Fujian 361000,China;Tencent Technology (Shenzhen) Co.,Ltd. Shenzhen,Guangdong 400300,China)
出处
《宁德师范学院学报(自然科学版)》
2018年第4期364-368,共5页
Journal of Ningde Normal University(Natural Science)