摘要
分布式存储系统为了保证可靠性,会采用一定的存储冗余策略,如多副本策略、纠删码策略.纠删码相对于副本具有存储开销小的优点,但节点修复网络开销大.针对修复网络开销优化,业界提出再生码和以简单再生码为代表的局部可修复码,显著降低了修复网络开销.然而,现有的基于编码的分布式容错存储方案大都假设节点处于星型逻辑网络结构中,忽略了实际的物理网络拓扑结构和带宽信息.为了实现拓扑感知的容错存储优化,相关研究在纠删码和再生码修复过程中结合网络链路带宽能力,建立树型修复路径,进一步提高了修复效率.但是,由于编码和修复过程的差异性,上述工作并不适合于简单再生码修复.针对该问题,结合实际物理网络拓扑结构,将链路带宽能力引入到简单再生码的修复过程中,对带宽感知的简单再生码修复优化技术开展研究.建立了带宽感知节点修复时延模型,提出了基于最优瓶颈路径和最优修复树的并行修复树构建算法,并通过实验对算法性能进行了评估.实验结果表明,与星型修复方式相比,该算法有效地降低了节点修复时延,提高了修复效率.
In order to ensure the data reliability, fault-tolerant distributed storage systems usually apply some data redundant strategies such as the multi-replicas strategy and the erasure coding strategy. Compared with the multi-replicas strategy, the erasure coding strategy has an advantage of storing less data while costs much more network traffic when repairing a failed node. To reduce the repair network cost, the regenerating code and locally repairable codes were proposed, and simple regenerating code is a representative of the locally repairable codes. However, most of the current fault-tolerant distributed storage methods based on coding strategy assume that the storage nodes are located in the star shaped logic network, ignoring the physical network topology and link bandwidth capacity. So with applying the physical network topology into the repair process of erasure code and regenerating code, some related researches propose methods of building tree shaped repair paths to get further more efficiency for repairing a failed node. But because of the difference in encoding and repairing process, those methods are not fit for the repair of simple regenerating code. To resolve the problem, this paper introduces the link bandwidth capacity into the repair process of simple regenerating code based on the physical network topology. It builds a bandwidth-aware node repair analysis model, and proposes an algorithm to build parallel repair trees based on the optimal bottleneck path and optimal repair tree methods. Experiments are also designed to evaluate the performance of the algorithm. The result shows that compared with the method based on star shaped network topology, the proposed algorithm reduces the latency of repair process effectively.
出处
《软件学报》
EI
CSCD
北大核心
2017年第8期1940-1951,共12页
Journal of Software
基金
国家自然科学基金(61373014)
国家电网科技项目(521104170019)~~
关键词
分布式存储
简单再生码
网络拓扑
节点修复
distributed storage
simple regenerating code
network topology
node repair