摘要
云存储中的Ceph分布式文件系统以其开原性和提供统一存储能力的特点在企业和科研领域得到广泛关注和应用.CRUSH算法是Ceph分布式文件系统中的伪随机数据分布算法,能实现在异构大规模层级结构化存储集群中高效分布数据对象及其副本.经典Ceph云存储系统中在副本模式下选择存储节点时该系统仅以节点存储容量作为唯一选择条件,并没有考虑到网络和节点的负载状况,这影响了系统在网络性能差和节点高负载的情况下的读写性能.因此,在CRUSH算法中加入网络状态和节点负载的测量对提高负载均衡性具有非常重要的作用.但在传统的网络构架中要获取网络状态需要繁琐的配置和大量的测量开销.为解决这些问题,该文设计了基于软件定义网络技术的Ceph云存储系统模型和存储节点选择策略,首先利用软件定义网络技术实时获取网络和负载状况,以简化网络配置和减小测量开销,然后通过建立并求解出综合考虑了多种因素的多属性决策数学模型来确定存储节点位置.通过在实际环境中对设计的存储节点选择方法进行读写操作的测试,测试结果表明,与现有的CRUSH算法相比,提出的存储节点选择方法可以在保持与原有Ceph系统相同的写操作性能的同时,读小文件操作时的吞吐量和读大文件的响应时间得到明显改善.
The traditional storage model cannot cope with massive data storage capacity scalability,data reliability and high performance,cloud storage systems came into being under this background.Cloud storage systems use distributed file systems and other technologies to assemble different storage devices into a pool of resources through network connections,unified provision of storage services,with high scalability,high reliability and so on.There are many kinds of distributed file systems for cloud storage,Ceph distributed file system,with its open source nature,and providing uniform storage capability,has been widely concerned in enterprises and scientific research fields.With the characters of open source and providing uniform storage capability,the Ceph storage system has been widely concerned in scientific research fields and industry application as one of the most regular cloud storage system.Data distribution strategy is a key technology in distributed file system,which determines location of data storage,load balancing and fault tolerance of the system.CRUSH algorithm is a pseudo-random algorithm for data distribution in Ceph distributed file system which can distribute data objects and their replicas efficiently in large-scale and heterogeneous hierarchical structured storage clusters.However,the open-source Ceph storage system uses storage capacity as the sole consideration for selecting storage nodes in replication scheme in its CRUSH algorithm.It ignores the loads on both the network and individual nodes and negatively affects the system’s read and write performance under heavy loads or poor conditions.It is important to utilize the network state information and node load in CRUSH algorithm to improve the load balance.But in the traditional network architecture,it needs cumbersome configuration and much of the measurement overhead.To address these deficiencies,we propose a Ceph enhancement that incorporates software-defined network(SDN)abstraction and an improved strategy for storage node selection.First the nodes’and network’s load status are obtained via SDN to simplify the network configuration and alleviate the measurement overhead.Compared with the traditional network architecture,getting network state requires cumbersome configuration and a lot of measurement overhead,Software Defined Network(SDN)separates the control plane and the data plane phase.Through the centralized control plane,it Simplifies network measurement and management and provides a flexible and efficient maintenance strategy,which are adopted the SDN technology to complete the monitoring of network and node load.Second,we establish a multi-attributes decision-making model to select storage nodes optimally.It aims to solve the load unbalanced problem of storage node caused by the storage capacity as the constraint condition in CRUSH algorithm.An improved CRUSH algorithm is proposed to add the factors of network state and load in node weight factor and the determination of weight factor has a finer granularity.We tested the performance of our proposed model in a live environment.The results indicated that the designed model and strategy can significantly improved the throughput for small files and response times for reading large files while offering write performance similar to the unmodified Ceph storage system compared with the original CRUSH algorithm.
作者
王勇
叶苗
何倩
郇宜鸣
康文杰
WANG Yong;YE Miao;HE Qian;HUAN Yi-Ming;KANG Wen-Jie(School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, Guangxi 541004;Key Laboratory of Cognitive Radio and Information Processing, Guilin University of Electronic Technology, Guilin University of Electronic Technology, Guilin, Guangxi 541004;Information Science and Technology, Guilin University of Technology, Guilin, Guangxi 541004)
出处
《计算机学报》
EI
CSCD
北大核心
2019年第2期323-338,共16页
Chinese Journal of Computers
基金
国家自然科学基金项目(61662018
61661015
61831013)
中国博士后科学(2016M602922XB)
广西创新驱动发展专项(科技重大专项桂科AA18118031)
桂林理工大学科研启动基金项目(GUTQDJJ20172000019)资助~~
关键词
软件定义网络
Ceph存储系统
多属性决策
副本模式
权重因子
software defined network
Ceph storage
multi-attribute decision-making
replication scheme
weighting factor