期刊文献+

面向数据分发系统的改进型并行I/O研究

Research on Improved Parallel I/O for Data Distribution System
下载PDF
导出
摘要 随着用户和业务复杂度的增加,数据仓库的数据对外服务能力急需提升,数据分发系统作为统一接口分发管理,不可避免地面对多用户数据访问的并发性通信阻塞问题。本文利用开源的Kettle工具构建数据分发功能应用,运用并行计算思想提升串行算法效率。在并行化过程中,详述了传统的数据分发收集并行I/O方案,并构建了时间估计方程。在分析总结其瓶颈问题的基础上,借鉴GoogleFileSystem的思想,提出了基于元数据的并行I/O改进型新方案。实验证明,不论并行计算进程数(计算单元数)多少,基于元数据的并行I/O方案比数据分发收集方案都具有更好的性能,数据导入、导出耗时更短。 The external service capability of data warehouse urgently needs to be improved with the increase of users and business complexity.As a unified interface,data distribution system is distributed and managed,and it is inevitable to deal with the congested communication congestion with multi-user data access.In this paper,open-source kettle tools are used to build data distribution applications,parallel computing ideas are used to improve the efficiency of serial algorithms.In the parallelization process,the traditional data distribution and collection parallel I/O scheme is described in detail,and the time estimation equation is constructed.On the basis of analyzing and summarizing its bottleneck problem,this paper proposes a new scheme of parallel I/O improvement based on metadata,referring to the idea of Google File System.Experiments show that,regardless of the number of parallel computing processes(the number of computational units),the metadata-based parallel I/O scheme has better performance than the data distribution and collection scheme,and the data import and derivation takes less time.
作者 肖招娣 皇甫汉聪 余永忠 吕顺锋 XIAO Zhao-di;HUANGFU Han-cong;YU Yong-zhong;LV Shun-feng(Foshan Power Supply Bureau,Guangdong Power Grid Co.,Ltd.,Foshan 528000 China;Guangdong Zhuo Wei Network Co.,Ltd.,Foshan 528000 China)
出处 《自动化技术与应用》 2018年第10期38-42,共5页 Techniques of Automation and Applications
关键词 数据分发 并行计算 并行I/O GoogleFileSystem 元数据 data distribution parallel computing parallel I/O Google File System metadata
  • 相关文献

参考文献11

二级参考文献54

共引文献184

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部