期刊文献+

一种基于分布式存储系统中多节点修复的节点选择算法 被引量:11

Node Selection Algorithm During Multi-Nodes Repair Progress in Distributed Storage System
下载PDF
导出
摘要 在分布式存储系统中,如何优化失效数据的修复时间以保证系统的高可靠性,已引起了人们的广泛关注.近几年的研究发现修复过程中不同的节点选择机制对数据的再生时间产生很大的影响,已有工作提出了单节点失效场景下的节点选择SPSN(select provider select newcomer)算法,系统中往往存在多个节点同时修复的情况,此时,SPSN算法巨大的时空开销使得数据的再生时间不再最优.对已有真实系统的失效数据及原因进行统计;基于已有算法特点和修复模型,提出了具有更优的多节点选择B-WSJ(bandwidth based weak and strong judgement)算法.为了更好地描述算法,对带宽中节点的关系进行分类,算法利用节点关系分别实现了修复模型中目标节点的浅度和深度判断,并加入一定的预处理和剪枝策略,最终快速选择出具有较优带宽的节点集合.为了评估B-WSJ算法性能,使用Waxman算法产生网络拓扑,依据FTA(failure trace archive)网站所给的真实系统的节点失效模型进行多次实验,仿真结果表明:B-WSJ算法使得节点修复性能得到了很大的提升. In distributed storage systems,how to optimize the regeneration time of lost data so as to keep high reliability has attracted attention increasingly.Recent researches reveal that node selection mechanism during repair progress has great impact on regeneration time.SPSN(select provider select newcomer)algorithm has put forward by some studies,which suits the scenario of single node failure well.However,it is very common to repair many modes at the same time in actual system.In this scenario,SPSN algorithm will no longer be optimal taking large time and space consumption into consideration.By analyzing the data failure trace of real distributed file system,we propose a node selection algorithm B-WSJ(bandwidth based weak and strong judgement)based on the existing algorithms and repairing model with the characteristic of parallelism which is suitable for multi-failure scenario.In order to describe the algorithm better,we firstly define several concepts of noderelationship on a link.Secondly we use these concepts to realize the weak and strong judgment of target node with pre-process and pruning strategy added.Finally,the nodes with better bandwidth will be chosen.To evaluate the performance of NS algorithm,we use Waxman algorithm to generate network topology and do many experiments with node failure models in real system provided by FTA(failure trace archive).The experimental results show the performance of B-WSJ algorithm can be improved greatly compared with the existing algorithms.
作者 刘佩 蒋梓逸 曹袖 Liu Pei;Jiang Ziyi;Cao Xiu(School of Computer Science and Technology,Fudan University,Shanghai 201203;Engineering Research Center of Cyber Security Auditing and Monitoring(Fudan University),Ministry of Education,Shanghai 200433)
出处 《计算机研究与发展》 EI CSCD 北大核心 2018年第7期1557-1568,共12页 Journal of Computer Research and Development
关键词 分布式存储系统 数据修复 再生时间 多节点失效 节点选择 distributed storage system data repair regeneration time multi node failure node selection
  • 相关文献

参考文献4

二级参考文献63

  • 1Layman P, Varian H R. How much information 2003? [EB/OL]. [2010 10-18]. http://www2, sims. berkeley. edu/research/proiects/how-mueh-info-2003.
  • 2Pinheiro E, Weber W D, Barroso L A. Failure trends in a large disk drive population [C] //Proc of the 5th USENIX Conf on File and Storage Technologies. Berkeley, CA: USENIX Association, 2007 : 17-28.
  • 3Schroeder B, Gibson G A. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? [C] //Proc of the 5th USENIX Conf on File and Storage Technologies. Berkeley, CA: USENIX Association, 2007: 1-16.
  • 4Bairavasundaram L N, Goodson G R, Pasupathy S, et al. An analysis of latent sector errors in disk drives [C]//Proc of 2007 ACM SIGMETRICS Int Conf on Measurement and Modeling of Computer Systems. New York: ACM, 200: 289-300.
  • 5Hafner J M, Deenadhayalan V, Rao K, et al. Matrix methods for lost data reconstruction in erasure codes [C] // Proc of the 4th USENIX Conf on File and Storage Technologies. Berkeley, CA: USENIX Association, 2005: 183-196.
  • 6Hafner J M, Deenadhayalan V, Kanungo T, et al. Performance metrics for erasure codes in storage systems, RJ 10321 [R]. San Jose, [A] IBM Research, 2004.
  • 7Li M, Shu J, Zheng W. GRID Codes: Strip based erasure codes with high fault tolerance for storage systems [J].ACM Transon Storage, 2009, 4(4): 1-22.
  • 8Blaum M, Brady J, Bruek J, et al. EVENODD: An efficient scheme for tolerating double disk failures in RAID architectures [J].IEEE Trans on Computer, 1995, 44 (2) 192-202.
  • 9Corbett P, English B, Goel A, et al. Row-diagonal redundant for double disk failure correction [C] //Proc of the 3rd USENIX Conf on File and Storage Technologies. Berkeley, CA: USENIX Association, 2004:2-15.
  • 10Xu L, Bruck J. X-code: MDS array codes with optimal encoding[J]. IEEE Trans on Information Theory, 1999, 45 (1) : 272-276.

共引文献362

同被引文献52

引证文献11

二级引证文献29

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部