期刊文献+

A non-group parallel frequent pattern mining algorithm based on conditional patterns 被引量:1

基于条件模式的一种无分组并行频繁模式挖掘算法(英文)
原文传递
导出
摘要 Frequent itemset mining serves as the main method of association rule mining.With the limitations in computing space and performance,the association of frequent items in large data mining requires both extensive time and effort,particularly when the datasets become increasingly larger.In the process of associated data mining in a big data environment,the MapReduce programming model is typically used to perform task partitioning and parallel processing,which could improve the execution effciency of the algorithm.However,to ensure that the associated rule is not destroyed during task partitioning and parallel processing,the inner-relationship data must be stored in the computer space.Because inner-relationship data are redundant,storage of these data will significantly increase the space usage in comparison with the original dataset.In this study,we find that the formation of the frequent pattern(FP)mining algorithm depends mainly on the conditional pattern bases.Based on the parallel frequent pattern(PFP)algorithm theory,the grouping model divides frequent items into several groups according to their frequencies.We propose a non-group PFP(NG-PFP)mining algorithm that cancels the grouping model and reduces the data redundancy between sub-tasks.Moreover,we present the NG-PFP algorithm for task partition and parallel processing,and its performance in the Hadoop cluster environment is analyzed and discussed.Experimental results indicate that the non-group model shows obvious improvement in terms of computational effciency and the space utilization rate. Frequent itemset mining serves as the main method of association rule mining. With the limitations in computing space and performance, the association of frequent items in large data mining requires both extensive time and effort, particularly when the datasets become increasingly larger. In the process of associated data mining in a big data environment, the MapReduce programming model is typically used to perform task partitioning and parallel processing, which could improve the execution effciency of the algorithm. However, to ensure that the associated rule is not destroyed during task partitioning and parallel processing, the inner-relationship data must be stored in the computer space. Because inner-relationship data are redundant, storage of these data will significantly increase the space usage in comparison with the original dataset. In this study, we find that the formation of the frequent pattern(FP) mining algorithm depends mainly on the conditional pattern bases. Based on the parallel frequent pattern(PFP) algorithm theory, the grouping model divides frequent items into several groups according to their frequencies. We propose a non-group PFP(NG-PFP) mining algorithm that cancels the grouping model and reduces the data redundancy between sub-tasks. Moreover, we present the NG-PFP algorithm for task partition and parallel processing, and its performance in the Hadoop cluster environment is analyzed and discussed. Experimental results indicate that the non-group model shows obvious improvement in terms of computational effciency and the space utilization rate.
出处 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2019年第9期1234-1245,共12页 信息与电子工程前沿(英文版)
基金 project supported by the Fundamental Research Funds for the Central Universities,China(No.2412015KJ005) the Twelfth Five-Year Plan of the Education Department of Jilin Province,China(No.557) the Thirteenth Five-Year Plan for Scientific Research of the Education Department of Jilin Province,China(No.JJKH20191197KJ)
关键词 Frequent PATTERN mining Parallel algorithm CONDITIONAL PATTERN BASES MAPREDUCE BIG data Frequent pattern mining Parallel algorithm Conditional pattern bases MapReduce Big data
  • 相关文献

参考文献1

共引文献1

同被引文献10

引证文献1

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部