摘要
Eclat算法采用垂直数据表示方式且无需复杂的数据结构,然而在挖掘频繁项目集过程中,交集计数的生成方式造成内存大量消耗和挖掘效率下降。为此,在分析Eclat算法及其现有改进算法基础上,提出一种位存储事务标识(Tid)的CPU并行化Eclat算法。该算法使用二进制位形式存储项目的 Tid,将挖掘频繁项目集的任务分配到CPU各个线程,最大限度地提高CPU的运算性能。实验结果表明,该算法能在降低内存使用的同时,提高频繁项目集的挖掘效率。
The Eclat algorithm uses vertical data representation and does not require complex data structures.However,the intersection count generation mode causes a large amount of memory consumption and low mining efficiency in the process of mining frequent itemsets.Therefore,based on the analysis of Eclat algorithm and its existing improved algorithm,a CPU parallelization Eclat algorithm for bit storing Transaction identifier(Tid)is proposed.The algorithm uses the binary bit form to store the Tid of the project,and distributes the tasks of mining frequent itemsets to each thread of the CPU,maximizing the computing performance of the CPU.Experimental results show that the algorithm can improve the mining efficiency of frequent itemsets while reducing memory usage.
作者
孙宗鑫
张桂芸
SUN Zongxin;ZHANG Guiyun(College of Computer and Information Engineering,Tianjin Normal University,Tianjin 300387,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2018年第12期79-84,共6页
Computer Engineering
基金
国家自然科学基金面上项目(61572358)
天津市自然科学基金面上项目(16JCYBJC23600)