期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
An Improvement of Data Cleaning Method for Grain Big Data Processing Using Task Merging 被引量:1
1
作者 feiyu lian Maixia Fu Xingang Ju 《Journal of Computer and Communications》 2020年第3期1-19,共19页
Data quality has exerted important influence over the application of grain big data, so data cleaning is a necessary and important work. In MapReduce frame, parallel technique is often used to execute data cleaning in... Data quality has exerted important influence over the application of grain big data, so data cleaning is a necessary and important work. In MapReduce frame, parallel technique is often used to execute data cleaning in high scalability mode, but due to the lack of effective design, there are amounts of computing redundancy in the process of data cleaning, which results in lower performance. In this research, we found that some tasks often are carried out multiple times on same input files, or require same operation results in the process of data cleaning. For this problem, we proposed a new optimization technique that is based on task merge. By merging simple or redundancy computations on same input files, the number of the loop computation in MapReduce can be reduced greatly. The experiment shows, by this means, the overall system runtime is significantly reduced, which proves that the process of data cleaning is optimized. In this paper, we optimized several modules of data cleaning such as entity identification, inconsistent data restoration, and missing value filling. Experimental results show that the proposed method in this paper can increase efficiency for grain big data cleaning. 展开更多
关键词 GRAIN BIG DATA DATA Cleaning TASK MERGING Hadoop MAPREDUCE
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部