摘要
为解决由Reduce任务引起的远程数据访问延时和资源竞争导致的系统性能问题,提出了一种基于预调度的数据预取方法.该方法通过预取数据来隐藏由Reduce任务引起的远程数据访问延时,通过控制与Reduce任务相关的资源分配来减少由其引起的资源竞争.此方法已在Hadoop-0.20.2中实现.实验结果表明,与缺省的Hadoop MapReduce及Hadoop Online Prototype相比,该方法可将系统性能提高10%以上.
Due to the data dependency and the special task execution mode in MapReduce environments, reduce tasks always cause massive remote data access delay and unnecessary resource competition, which degrades the system performance. To solve the performance problem, we propose a pre-fetching method based on pre-scheduling. The method hides the remote data access delay by pre-fetching, and controls the resource competition by adjusting resource allocation of reduce tasks. The method is implemented in Hadoop-0. 20. 2. The experimental results show that the method improves the system performance by more than 10 %, compared with default Hadoop MapReduce and Hadoop Online Prototype.
出处
《西安电子科技大学学报》
EI
CAS
CSCD
北大核心
2014年第2期191-196,共6页
Journal of Xidian University
基金
国家自然科学基金资助项目(51274088)
河南省教育厅资助项目(ITE12103)
河南理工大学博士基金资助项目(B2012-099)
河南理工大学矿山信息化省级重点实验室资助项目(KY2012-05)