摘要
受到网络中的冗余数据干扰,采集系统采集到的信息混杂,导致数据采集速度过慢,文中针对该问题,设计一种基于网络爬虫技术的大数据采集系统。硬件部分设计了多通道数据采集板,使用双级联锁相环结构控制采集板硬件,设计硬件连接电路,并安置硬件连接线路。软件部分利用网络爬虫技术设定数据采集规则,提取融合采集数据并筛选,根据不同采集数据信息间的二元互信息,定义信息数据相关量,对采集到的大数据信息进行排序,最终完成对大数据采集系统的设计。文中选用已知参数的PC机,搭建实验环境,并采用两种传统大数据采集系统与文中设计的大数据采集系统进行实验,结果表明,文中设计的大数据采集系统数据采集速度最快。
The information collected by the acquisition system is mixed due to the interference of redundant data in the network,which leads to the slow data acquisition speed.In view of this phenomenon,a big data acquisition system based on web crawler technology is designed in this paper.In the hardware part,the multi⁃channel data acquisition board is designed,which is controlled by the double⁃cascade phase⁃locked loop structure;the hardware connection circuit is designed,and the hardware connection lines are arranged.In the software part,the web crawler technology is used to set data collection rules,extract and filter the fusion collected data.According to the binary mutual information between different collected data information,the information data correlative is defined,and the collected big data information is sorted.So far,the design of the big data acquisition system is completed.After setting up the experimental environment,two kinds of traditional big data acquisition systems and the big data acquisition system designed in this paper are used for experiments.The results show that the big data acquisition system designed in this paper has the fastest data acquisition speed.
作者
罗春
LUO Chun(University of Electronic Science and Technology of China,Chengdu 610054,China;Sichuan Tianyi College,Mianzhu 618200,China)
出处
《现代电子技术》
2021年第16期115-119,共5页
Modern Electronics Technique
基金
四川省自然科学基金项目(19ZDYF0028)
四川省教育厅人文社科重点项目(18SA0090)
四川省教育厅自然科学一般项目(17ZB0218)。
关键词
大数据采集
网络爬虫技术
数据采集板
硬件连接
软件设计
仿真实验
big data acquisition
web crawler technology
data acquisition board
hardware connection
software design
simulation experiment