期刊文献+

基于应用程序的MapReduce性能优化 被引量:4

Performance Optimization of MapReduce Based on Applications
下载PDF
导出
摘要 针对MapReduce框架执行效率不佳的问题,对MapReduce性能优化的多个方案进行了研究。首先阐述了云计算的定义、特征以及专用批处理Paa S平台Hadoop的组成,之后简单介绍了MapReduce框架和MapReduce框架下的应用程序开发,接着着重讨论了MapReduce性能优化的三个主流方向:系统实现优化、参数调优、应用程序优化。并从应用程序着手,提出多个解决方法,进行了in-Map Reduce优化算法、脚本语言/编译语言对比、小文件预处理优化等多个实验,最后对优化技术和实验数据进行了分析。实验结果表明,优化应用程序是提高MapReduce性能的有效手段。 For the problem of poor execution efficiency under MapReduce framework, multiple solutions on performance optimization of MapReduce is studied. Firstly, the cloud computing definition and its characteristics, Hadoop composition are described in detail, then the framework of MapReduce and application development under the framework is introduced in this paper. Three main directions of perform- ance optimization for MapReduce are also discussed, including the system optimization realized, parameters optimized and application op- timization. Furthermore,multiple solutions are put forward from the application viewpoint, including in-Map Reduce optimization algo- rithm,script language/compile language contrast experiment,tuning for small files. Lastly, analyze optimization technique and experimen- tal data. The experimental results show that the optimized application is an effective means to improve the performance of MapReduce.
出处 《计算机技术与发展》 2015年第7期96-99,106,共5页 Computer Technology and Development
基金 江苏省卓越工程师(软件类)计划试点专业(苏教高函[2012]17号) 江苏省高等学校软件服务外包类专业嵌入式人才培养项目(苏教高函[2014]14号) 江苏省电力公司科技项目(J2014057) 三江学院本科工程二期项目(J14021)
关键词 MAPREDUCE 应用程序 性能优化 in-Map REDUCE 小文件优化 MapReduce application performance optimization in-Map Reduce tuning for small files
  • 相关文献

参考文献13

  • 1张建成,宋丽华,鹿全礼,郭锐,刘永泉.云计算方案分析研究[J].计算机技术与发展,2012,22(1):165-167. 被引量:36
  • 2WhiteT.Hadoop权威指南[M].周敏奇,王晓玲,金澈清,等,译.第2版.北京:清华大学出版社,2011.
  • 3Shvachko K, Kuang H, Radia S, et al. The hadoop distributed file system[ C]//Proc of 2010 IEEE 26th symposium on mass storage systems and technologies. [ s. 1. ] : IEEE ,2010 : 1 - 10.
  • 4Ghemawat S, Gobioff H, Leung S T. The Google file system [C]//Proc of ACM symposium on operating systems princi- ple. New York : ACM ,2003:29-43.
  • 5Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters [ J ]. Communications of the ACM, 2008,51 (1):107-113.
  • 6李建江,崔健,王聃,严林,黄义双.MapReduce并行编程模型研究综述[J].电子学报,2011,39(11):2635-2642. 被引量:187
  • 7Yang H, Dasdan A, Hsiao R L, et al. Map- reduce - merge :simplified relational data processing on large clusters [ C ]// Proceedings of the 2007 ACM SIGMOD international confer- ence on management of data. [ s. 1. ] : ACM, 2007:1029 - 1040.
  • 8Jiang D, Ooi B C, Shi L, et al. The performance of mapreduce : An in-depth study[ J ]. Proeeedings of the VLDB Endowment, 2010,3(1-2) :472-483.
  • 9Hadoop C++ extention[ EB/OL]. (2012-07-03)[2014-10- 15 ]. https ://issues. apache, org/j 1270.
  • 10UCE- Seo S ,Jang I, Woo K, et al. HPMR : prefetching and pre-shuff- ling in shared MapReduce computation environment [ C ]// Proc of IEEE international conference on cluster computing and workshops. Is. 1. ] :IEEE,2009:1-8.

二级参考文献67

  • 1宁焕生,张瑜,刘芳丽,刘文明,渠慎丰.中国物联网信息服务系统研究[J].电子学报,2006,34(B12):2514-2517. 被引量:151
  • 2中国云计算网.什么是云计算?[EB/OL].(2008-05-14)[2009-02-27],http://www.cloudcomputingchina.cn/Article/ShowArticle.asp?ArticleID=1.
  • 3J Dean,S Ghemawat.MapReduce:Simplified data processing on large clusters[J].Communications of the ACM,2008,51(1):107-113.
  • 4J L Wagener.High performance fortran[J].Computer Standards & Interfaces,Elsevier,1996,18(4):371-377.
  • 5W Gropp,E Lusk,et al.Using MPI:Portable Parallel Programming with the Message Passing Interface[M].Cambridge:MIT Press,1999.1-350.
  • 6A Geist,A Beguelin,et al.PVM:Parallel Virtual Machine:A Users' Guide and Tutorial for Networked Parallel Computing[M].Cambridge:MIT Press,1995.1-299.
  • 7A Verma,N Zea,et al.Breaking the mapreduce stage barrier .Proc of IEEE International Conference on Cluster Computing .Los Alamitos:IEEE Computer Society,2010.235-244.
  • 8H C Yang,A Dasdan,et al.Map-Reduce-Merge:Simplified relational data processing .Proc of ACM SIGMOD International Conference on Management of Data .New York:ACM,2007.1029-1040.
  • 9S V Valvag,D Johansen.Oivos:Simple and efficient distributed data processing .Proc of IEEE International Conference on High Performance Computing and Communications .Piscataway:IEEE,2008.113-122.
  • 10Z Vrba,P Halvorsen,et al.Kahn process networks are a flexible alternative to mapreduce .Proc of IEEE International Conference on High Performance Computing and Communications .Piscataway:IEEE,2009.154-162.

共引文献246

同被引文献40

引证文献4

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部