摘要
在分布式文件系统(DFS)中,副本的数量通常预先配置,不能适应文件访问需求的动态变化。针对该问题,提出了基于热度分析的动态副本创建算法(DRCA),优化副本的管理方式。动态副本创建算法通过分析文件在给定时间内的访问频率,推算文件的访问热度,并综合考虑统计周期、文件大小、工作环境等多种因素,按需动态地调整文件副本的数量,以减少文件访问的平均响应时间,提高数据服务性能。基于Hadoop的分布式文件系统(HDFS)开发了DRCA模块,并进行了性能测试,结果表明DRCA提供数据服务的性能优于HDFS自带的副本创建算法。
In a distributed file system, the number of replicas is usually pre-configured which cannot adapt to the dynamic change of the file's accessing need. For this problem, a Dynamic Replica Creation Algorithm( DRCA) based on temperature's analysis was proposed, which could optimize the management of replicas. Dynamic replica creation algorithm calculated file's accessing temperature by analyzing file's accessing frequency within the specified time, also considered multiple factors such as statistical period, file size and work environment, then adjusted dynamically the number of file's replicas on-demand, thus reduced average response time of file's accessing, and improved the performance of data service.The DRCA module was developed based on Hadoop Distributed File System( HDFS), and then its performance was tested.The results indicate that the DRCA's performance of providing data service is better than HDFS' own replica creation algorithm.
出处
《计算机应用》
CSCD
北大核心
2014年第A02期130-134,共5页
journal of Computer Applications
基金
国家自然科学基金资助项目(60904082)
关键词
分布式文件系统
副本
创建
热度
访问频率
HDFS
Distributed File System (DFS)
replica
creation
temperature
accessing frequency
Hadoop DistributedFile System (HDFS)