A non-group parallel frequent pattern mining algorithm based on conditional patterns 被引量：1

基于条件模式的一种无分组并行频繁模式挖掘算法（英文）

导出

摘要 Frequent itemset mining serves as the main method of association rule mining.With the limitations in computing space and performance,the association of frequent items in large data mining requires both extensive time and effort,particularly when the datasets become increasingly larger.In the process of associated data mining in a big data environment,the MapReduce programming model is typically used to perform task partitioning and parallel processing,which could improve the execution effciency of the algorithm.However,to ensure that the associated rule is not destroyed during task partitioning and parallel processing,the inner-relationship data must be stored in the computer space.Because inner-relationship data are redundant,storage of these data will significantly increase the space usage in comparison with the original dataset.In this study,we find that the formation of the frequent pattern(FP)mining algorithm depends mainly on the conditional pattern bases.Based on the parallel frequent pattern(PFP)algorithm theory,the grouping model divides frequent items into several groups according to their frequencies.We propose a non-group PFP(NG-PFP)mining algorithm that cancels the grouping model and reduces the data redundancy between sub-tasks.Moreover,we present the NG-PFP algorithm for task partition and parallel processing,and its performance in the Hadoop cluster environment is analyzed and discussed.Experimental results indicate that the non-group model shows obvious improvement in terms of computational effciency and the space utilization rate. Frequent itemset mining serves as the main method of association rule mining. With the limitations in computing space and performance, the association of frequent items in large data mining requires both extensive time and effort, particularly when the datasets become increasingly larger. In the process of associated data mining in a big data environment, the MapReduce programming model is typically used to perform task partitioning and parallel processing, which could improve the execution effciency of the algorithm. However, to ensure that the associated rule is not destroyed during task partitioning and parallel processing, the inner-relationship data must be stored in the computer space. Because inner-relationship data are redundant, storage of these data will significantly increase the space usage in comparison with the original dataset. In this study, we find that the formation of the frequent pattern(FP) mining algorithm depends mainly on the conditional pattern bases. Based on the parallel frequent pattern(PFP) algorithm theory, the grouping model divides frequent items into several groups according to their frequencies. We propose a non-group PFP(NG-PFP) mining algorithm that cancels the grouping model and reduces the data redundancy between sub-tasks. Moreover, we present the NG-PFP algorithm for task partition and parallel processing, and its performance in the Hadoop cluster environment is analyzed and discussed. Experimental results indicate that the non-group model shows obvious improvement in terms of computational effciency and the space utilization rate.

作者 Zhe-jun KUANG Hang ZHOU Dong-dai ZHOU Jin-peng ZHOU Kun YANG

机构地区 College of Computer Science and Technology School of Economics School of Information Science and Technology Division of Engineering Science School of Computer Science and Electronic Engineering

出处《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2019年第9期1234-1245,共12页 信息与电子工程前沿（英文版）

基金 project supported by the Fundamental Research Funds for the Central Universities,China(No.2412015KJ005) the Twelfth Five-Year Plan of the Education Department of Jilin Province,China(No.557) the Thirteenth Five-Year Plan for Scientific Research of the Education Department of Jilin Province,China(No.JJKH20191197KJ)

关键词 Frequent PATTERN mining Parallel algorithm CONDITIONAL PATTERN BASES MAPREDUCE BIG data Frequent pattern mining Parallel algorithm Conditional pattern bases MapReduce Big data

分类号 TP301 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献1

1Ke-shi GE,Hua-you SU,Dong-sheng LI,Xi-cheng LU.Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit[J].Frontiers of Information Technology & Electronic Engineering,2017,18(7):915-927. 被引量：2

共引文献1

1Yu Tang,Zhigang Kan,Lujia Yin,Zhiquan Lai,Zhaoning Zhang,Linbo Qiao,Dongsheng Li.Increasing Momentum-Like Factors:A Method for Reducing Training Errors on Multiple GPUs[J].Tsinghua Science and Technology,2022,27(1):114-126. 被引量：1

同被引文献10

1DENG ZhiHong,WANG ZhongHui,JIANG JiaJian.A new algorithm for fast mining frequent itemsets using N-lists[J].Science China(Information Sciences),2012,55(9):2008-2030. 被引量：25
2陈兴蜀,张帅,童浩,崔晓靖.基于布尔矩阵和MapReduce的FP-Growth算法[J].华南理工大学学报（自然科学版）,2014,42(1):135-141. 被引量：22
3张春,汲磊举.基于MapReduce的Eclat改进算法研究与应用[J].北京交通大学学报,2016,40(3):1-6. 被引量：4
4宋杰,孙宗哲,毛克明,鲍玉斌,于戈.MapReduce大数据处理平台与算法研究进展[J].软件学报,2017,28(3):514-543. 被引量：96
5秦军,郝天曙,董倩倩.基于MapReduce的Apriori算法并行化改进[J].计算机技术与发展,2017,27(4):64-68. 被引量：11
6程阳,章韵.基于MapReduce-HBase的Apriori算法的改进与研究[J].南京邮电大学学报（自然科学版）,2018,38(5):91-99. 被引量：6
7肖文,胡娟,周晓峰.基于MapReduce计算模型的并行关联规则挖掘算法研究综述[J].计算机应用研究,2018,35(1):13-23. 被引量：47
8孙宗鑫,张桂芸.基于位存储Tid的CPU并行化Eclat算法[J].计算机工程,2018,44(12):79-84. 被引量：2
9尹远,张昌,文凯,郑云俊.基于DiffNodeset结构的最大频繁项集挖掘算法[J].计算机应用,2018,38(12):3438-3443. 被引量：5
10高权,万晓冬.基于负载均衡的并行FP-Growth算法[J].计算机工程,2019,45(3):32-35. 被引量：7

引证文献1

1刘卫明,张弛,毛伊敏.采用N-list结构的混合并行频繁项集挖掘算法[J].计算机科学与探索,2022,16(1):120-136. 被引量：6

二级引证文献6

1陈榆,何慧敏,梁志胜,欧旭.基于MapReduce的健康大数据并行挖掘算法研究[J].现代电子技术,2023,46(12):79-83.
2李嵘,郑庆红,王晓瑜.基于改进大数据频繁项集挖掘算法的中深层地热能供热潜力评估方法[J].微型电脑应用,2023,39(10):23-26. 被引量：1
3张阳,王瑞,吴贯锋,刘弘毅.基于N-list和DiffNodeset结构的频繁项集并行挖掘算法[J].计算机科学,2023,50(11):55-61.
4汪江,温炜.基于时域特征的电力感知数据频繁项查询[J].自动化仪表,2023,44(12):80-84.
5闫利霞,凌兴宏,尼洪涛.基于Apriori算法的混合型数据频繁项集挖掘算法[J].计算机仿真,2023,40(12):538-542. 被引量：2
6孙丽君.电炉企业异构网络共享数据跨级高效项集挖掘系统[J].工业加热,2024,53(3):55-58.

1王高捍.安防大数据系统安全的现状与挑战分析[J].移动信息,2019(2):65-66.
2Li Ting.‘Wonderful Night of Museum’Hopefully Becoming Normal in Shanghai[J].China & The World Cultural Exchange,2019,85(8):38-39.
3李月梅,尚花.氮肥用量对青海甘蓝型春油菜产量及氮肥利用效率的影响[J].青海大学学报（自然科学版）,2019,37(4):20-25. 被引量：2
4Anderson Rodrigo Barretto Teodoro,Paulo AndréLima de Castro.A Method to Identify Anomalies in Stock Market Trading Based on Probabilistic Machine Learning[J].Journal of Autonomous Intelligence,2019,2(2):42-52.
5第三届人工智能与大数据国际会议(ICAIBD 2020)（英文）[J].智能系统学报,2019,14(5):1025-1025.
6董昕.反相超高效液相色谱法同时测定植物油中8种V_E异构体[J].食品与机械,2019,0(8):73-76. 被引量：2
7Than Le,Dang Huynh.Efficient Human-Robot Interaction using Deep Learning with Mask R-CNN:Detection,Recognition,Tracking and Segmentation[J].Progress in Human Computer Interaction,2018,1(2):12-22.
8黄锦敬.海洋实时监测数据库的多来源数据深度挖掘方法[J].舰船科学技术,2019,41(12):196-198.
9孙琳琳,汪军华,汪翔.子午流注针法联合中药熏蒸治疗面神经炎急性期的临床疗效研究[J].中国高等医学教育,2019(5):137-137. 被引量：1
10白星星.奏响云与星之歌 CCEU乌兰布统草原客户联谊活动举行[J].中国会展,2019,0(18):80-83.

Frontiers of Information Technology & Electronic Engineering

2019年第9期

浏览历史

内容加载中请稍等...