期刊文献+

基于Hadoop的关联规则挖掘算法研究——以Apriori算法为例 被引量:18

Research on Association Rules Mining Algorithm Based on Hadoop——Taking Apriori as an Example
下载PDF
导出
摘要 为了解决传统关联规则挖掘算法在挖掘效率、算法扩展性等方面无法适应大数据挖掘需求的问题,以经典的关联规则挖掘算法—Apriori算法为例,首先基于Hadoop平台和MapReduce编程模型,实现算法的并行化。在此基础上,基于事务缩减的思想对算法进行优化,进一步提高算法的挖掘效率。搭建Hadoop集群环境,对算法的挖掘结果和挖掘效率进行实验。通过并行挖掘结果验证、串行版与并行版效率对比、挖掘时间与节点数目的变化关系、挖掘时间与数据量的变化关系4组实验,结果表明:文中实现的Apriori算法不仅能够准确挖掘频繁项集,而且比传统串行算法具有更高的挖掘性能和可扩展性。该算法能够更好地适应大数据集的挖掘要求,能够实现从大规模数据集中高效挖掘频繁项集和关联规则。 In order to solve the problem that the traditional association rules mining algorithm has been unable to meet the mining needs of large amount of data in the aspect of efficiency and scalability,take Apriori as an example,the algorithm is realized in the parallelization based on Hadoop framework and MapReduce model. On the basis,it is improved using the transaction reduce method for further enhancement of the algorithm 's mining efficiency. The experiment,which consists of verification of parallel mining results,comparison on efficiency between serials and parallel,variable relationship between mining time and node number and between mining time and data amounts,is carried out in the mining results and efficiency by Hadoop clustering. Experiments showthat the paralleled Apriori algorithm implemented is able to accurately mine frequent item sets,with a better performance and scalability. It can be better to meet the requirements of big data mining and efficiently mine frequent item sets and association rules from large dataset.
出处 《计算机技术与发展》 2016年第7期1-5,共5页 Computer Technology and Development
基金 国家自科基金面上项目(71473114)
关键词 数据挖掘 关联规则 HADOOP APRIORI data mining association rules Hadoop Apriori
  • 相关文献

参考文献26

  • 1Agrawal R, Srikant R. Fast algorithms for mining association rules[ C ]//Proceedings of the 20th VLDB conference. Santiago, Chile : [ s. n. ], 1994:487-499.
  • 2Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation [ J ]. ACM SIGMOD Record, 2000,29 ( 2 ) : 1 - 12.
  • 3Agrawal R, Shafer J C. Parallel mining of association rules [ J]. IEEE Transactions on Knowledge and Data Engineering, 1996,8(6) :962-969.
  • 4Zaki M J. Scalable algorithms for association mining [ J ]. IEEE Transactions on Knowledge and Data Engineering, 2000, 12 (3) :372-390.
  • 5Park J S, Chen M S, Yu P S. An effective hash-based algorithm for mining association rules[ J]. ACM SIGMOD Record, 1995,24(2) :175-186.
  • 6Sarasere A, Omiecinsky E, Navathe S. An efficient algorithm for mining association rules in large databases [ C]//Proc of 21 st international conference on very large databases. Zurich, Switzerland : [ s. n. ] , 1995.
  • 7Toivonen H. Sampling large databases for association rules [C]//Proc of conference on very large data bases. [s. l. ]: [s. n. ] ,1999:134-145.
  • 8孙逢啸,倪世宏,谢川.一种基于矩阵的Apriori改进算法[J].计算机仿真,2013,30(8):245-249. 被引量:21
  • 9罗丹,李陶深.一种基于压缩矩阵的Apriori算法改进研究[J].计算机科学,2013,40(12):75-80. 被引量:46
  • 10高海洋,沈强,张轩溢,赵志军.一种基于数据压缩的Apriori算法[J].计算机工程与应用,2013,49(14):117-120. 被引量:6

二级参考文献74

  • 1秦亮曦,苏永秀,刘永彬,梁碧珍.基于压缩FP-树和数组技术的频繁模式挖掘算法[J].计算机研究与发展,2008,45(z1):244-249. 被引量:16
  • 2李敏,李春平.频繁模式挖掘算法分析和比较[J].计算机应用,2005,25(B12):166-171. 被引量:11
  • 3徐前方,阚建杰,李永春,李荣盛,郭军.一种具有时序特征的告警关联规则挖掘算法[J].微电子学与计算机,2007,24(3):23-26. 被引量:6
  • 4李志云,周国祥.一种基于MFP树的快速关联规则挖掘算法[J].计算机技术与发展,2007,17(6):94-96. 被引量:6
  • 5Dean J, Ghemmawat S. MapReduce: simplied data processing on large clusters [ C ]//Proceedings of the 6th Sympesium on Operating System Design and Implementation. New York: ACM Press, 2004:137 -150.
  • 6Ranger C, Raghuraman R, Penmetsa A. Evaluating MapReduce for multicore and mutiprocessor systems [ C ] //Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture. Washington: IEEE Computer Society, 2007 : 13 -24.
  • 7Kruuf M D, Sankaralinggam K. MapReduce for the cell B.E. architecture [ R ]. Madison: University of Wisconsin - Madison, 2007.
  • 8He Bing - sheng, Fang Wen - bin, Naga K Govindaraju, et al. Mars : a MapReduce framework on graphics processors [ C ] // Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. New York: ACM Press, 2008 : 260 "269.
  • 9Zaharia M, Konwinski A, Joseph A D. Improving MapReduce performance in heterogeneous environments [ C ] //Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation. New York: ACM Press, 2008:29 -42.
  • 10Tomwhite.Hadoop权威指南:中文版[M].曾大聃,周傲英,译.北京:清华大学出版社,2010.

共引文献132

同被引文献149

引证文献18

二级引证文献66

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部