期刊文献+

基于关联规则挖掘的分布式小文件存储方法 被引量:8

Approach of Distributed Small File Storage Based on Association Rule Mining
下载PDF
导出
摘要 Hadoop分布式文件系统(HDFS)设计之初是针对大文件的处理,但无法高效地针对小文件进行存储,因此提出了一种基于关联规则挖掘的高效的小文件存储方法——ARMFS。ARMFS通过对Hadoop系统的审计日志进行关联规则挖掘,获得小文件间的关联性,通过文件合并算法将小文件合并存储至HDFS;在请求HDFS文件时,根据关联规则挖掘得到的高频访问表和预取机制表提出预取算法来进一步提高文件访问效率。实验结果表明,ARMFS方法明显提高了NameNode的内存使用效率,对于小文件的下载速度和访问效率的改善十分有效。 Hadoop distributed file system (HDFS) is previously designed for large file processing,but it is not effective for small file storage. This paper proposes an efficient method of distributed small file storage by means of association rule mining and named ARMFS. By analyzing the audit logs to obtain the association of small files,these small files are merged and compressed to HDFS via file merge algorithm. When requesting HDFS file,the prefetching algorithm is further proposed to improve the access efficiency according to the high frequency access table and prefetching table that is based on association rules. The experiment results show that the ARMFS method can significantly improve the memory efficiency on NameNode and the access efficiency of the small file on HDFS.
出处 《华东理工大学学报(自然科学版)》 CAS CSCD 北大核心 2016年第5期708-714,共7页 Journal of East China University of Science and Technology
基金 国家自然科学基金(61300041 61272198)
关键词 HDFS 关联规则挖掘 小文件关联性 预取 HDFS association rule mining the association of small files prefetching
  • 相关文献

参考文献2

二级参考文献10

  • 1颜跃进,李舟军,陈火旺.基于FP-Tree有效挖掘最大频繁项集[J].软件学报,2005,16(2):215-222. 被引量:68
  • 2陆介平,杨明,孙志挥,鞠时光.快速挖掘全局最大频繁项目集[J].软件学报,2005,16(4):553-560. 被引量:27
  • 3Peir J K,Hsu W W,Smith A J.Functional Implementation Techniques for CPU Cache Memories[J].IEEE Transaction on Computers,1999,48(2):100-110.
  • 4Jouppi N P.Improving Direct Mapping Cache Performance by the Addition of a Small Full Associative Cache and Prefetch Buffers[C]// Proceedings of the 17th International Symposium on Computer Architecture,Seattle.1990:364-373.
  • 5Kurpanek G,Chan G,Zheng K,et al.PA7200:A PA-RISC Processor with Integrated High Performance MP Bus Interface[C]//Proceedings of IEEE International Computer Conference,San Francisco.1994:375-382.
  • 6Lee Jung-Hoon,Lee Jang-Soo,Kim Shin-Dug.A Selective Temporal and Aggressive Spatial Cache System Based on Time Interval[C] //Proceedings of the IEEE International Conference on Computer Design,Austin.2000:287-293.
  • 7Lee Jung-Hoon,Kim Shin-Dug,Weems C C.Application-adaptive Intelligent Cache Memory System[J].ACM Transactions on Embedded Computing Systems,2002,1(1):56-78.
  • 8Burger D,Austin T M.Evaluating Future Processors:The Simple Scalar Tool Set[R].Madison:University of Wisconsin,1997.
  • 9Tam E S,Rivers J A,Tyson G S,et al.Mlcache:A Flexible Multilateral Cache Simulater[C]//Proceedings of the 6th International Symposium on Modeling,Analysis and Simulation of Computer and Telecommunication Systems,Montreal.1998:19-26.
  • 10宋余庆,朱玉全,孙志挥,陈耿.基于FP-Tree的最大频繁项目集挖掘及更新算法[J].软件学报,2003,14(9):1586-1592. 被引量:164

共引文献8

同被引文献67

引证文献8

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部