期刊文献+

MapReduce中shuffle优化与重构 被引量:8

Optimization and reconstruction shuffle in MapReduce
下载PDF
导出
摘要 详细介绍了MapReduce编程框架,具体分析了MapReduce中shuffle阶段流程。分别从Map端数据压缩、重构远程数据拷贝传输协议、Reduce端内存分配优化三方面来优化和重构Shuffle。最后通过搭建Hadoop集群,运用MapReduce分布式算法测试实验数据。实验结果证明优化重构后的shuffle能显著提高MapReduce计算性能。 We describe the MapReduce programming framework in detail,and analyze the shuffle-stage process.Shuffle in MapReduce is optimized and reconstructed through the following three measures:compressing the output of the Map end,reconstructing the protocol used to copy the data form the Map end to the Reduce end,and optimizing memory allocation on the Reduce end.Finally,through building a Hadoop cluster,the experimental data are tested using the MapReduce distributed algorithm.Experimental results show that the MapReduce computing performance improves significantly after optimizing the reconstructed shuffle.
出处 《中国科技论文》 CAS 北大核心 2012年第4期241-245,共5页 China Sciencepaper
基金 清华-腾讯互联网创新技术联合实验室资助项目(2011-8)
关键词 云计算 HADOOP MAPREDUCE SHUFFLE cloud computing Hadoop MapReduce shuffle
  • 相关文献

参考文献1

  • 1Ming-Yee Iu,Willy Zweanepoel. HadoopToSQL: a MapReduce query optimizer[A].France:Paris,2010.251-264.

同被引文献43

  • 1李盛恩,王珊.封闭数据立方体技术研究[J].软件学报,2004,15(8):1165-1171. 被引量:25
  • 2蒋占四,陈立平,罗年猛.最近邻实例检索相似度分析[J].计算机集成制造系统,2007,13(6):1165-1168. 被引量:65
  • 3WHITE T.Hadoop:the definitive guide[M].California:O'Reilly Media,2012.
  • 4DEAN J,GHEMAWAT S.MapReduce:simplified data processing on large clusters[J].Communications of the ACM,2008,51(1):107-113.
  • 5BORTHAKUR D.HDFS architecture guide[DB/OL].Hadoop apache project.(2008-02-14).[2013-04-22].http://hadoop.apache.org/common/docs/current/hdfsdesign.pdf.
  • 6CONDIE T,CONWAY N,ALVARO P,et al.MapReduce online[C].Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation,2010:21-21.
  • 7WhiteT.Hadoop权威指南[M].周敏奇,王晓玲,金澈清,等,译.第2版.北京:清华大学出版社,2011.
  • 8Shvachko K, Kuang H, Radia S, et al. The hadoop distributed file system[ C]//Proc of 2010 IEEE 26th symposium on mass storage systems and technologies. [ s. 1. ] : IEEE ,2010 : 1 - 10.
  • 9Ghemawat S, Gobioff H, Leung S T. The Google file system [C]//Proc of ACM symposium on operating systems princi- ple. New York : ACM ,2003:29-43.
  • 10Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters [ J ]. Communications of the ACM, 2008,51 (1):107-113.

引证文献8

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部