摘要
随着企业信息化水平和企业精细化管理要求的不断提高,企业对数据管理的需求也随之增强,如何提高企业数据质量更是需要重点解决的问题。针对电力企业数据质量管理面临的挑战,创新提出了分布式数据质量管理解决方案。针对集中式数据质量系统的性能瓶颈,在研究数据质量系统特点并借鉴国内外对大数据的解决方案后,提出了基于Hadoop分布式处理框架的解决方案。利用Hadoop集群,可以把缺陷数据从Oracle中抽离,分散存储在集群里多台服务器上,以有效提高磁盘I/O性能和数据分析性能。
As the improvement of the enterprise's informationalization level and the increasing management requirement of enterprise refinement, the demand of data management of enterprise is becoming greater and greater,how to improve the data quality of the enterprise is the key problem needed to be solved. Aiming at the challenges of data quality management that the power enterprise faces, some solutions for distributed data quality management were proposed. After researching the system features of data quality, some foreign and domestic cases of big data were analyzed as reference, and a solution based on Hadoop distributed processing framework was given to solve the performance bottleneck of centralized data quality system. Hadoop clustering could dissociate defect data from Oracle and the data would be stored separately on multiple servers of the clustering, which could improve the I/O performance and data analysis performance of the magnetic disk effectively.
出处
《电信科学》
北大核心
2016年第4期169-174,共6页
Telecommunications Science