摘要
针对Apriori算法的性能瓶颈问题,提出了一种双压缩Apriori(Apriori double compression,Apriori_DC)算法.该算法通过不断压缩事务数据库,减少事务记录数和数据项,并通过缩减频繁项集从而减少下一步候选频繁项集的数量,最终实现提高算法效率.试验验证表明:在支持度相同而数据量不同,以及数据量相同而支持度不同时,Apriori_DC算法均优于Apriori算法,且在Apriori_DC算法执行过程中,事务数据库的数据量不断缩小.
A new algorithm based on double compression, which was called as Apriori double compres- sion (Apriori_ DC ) , was proposed, according to the performance bottleneck problem of Apriori algorithm. Two ways were used to improve performance: the transaction database was continually compressed to re- duce the transaction record and the total item in the database ; the number of the next candidate frequent item set was to reduce by compressing the frequent item set. The experiments showed that Apriori_ DC al- gorithm had better performance than Apriori algorithm when the support ratio was the same and the record number of the database was different or the record number of the database was the same and the support ratio was different. The experiment also showed that the record number of the database was continually reduced during the execution of the Apriori_ DC algorithm.
作者
郑建华
徐龙琴
刘双印
张世龙
ZHENG Jianhua;XU Longqin;LIU Shuangyin;ZHANG Shilong(College of Information Science and Technology, Zhongkai University of Agriculture and Engineering, Guangzhou 510225, China)
出处
《仲恺农业工程学院学报》
CAS
2017年第4期26-31,共6页
Journal of Zhongkai University of Agriculture and Engineering
基金
国家自然科学基金(61471133
61571444)
广东省科技计划(2013B090600065
2017A070712019)
广州市科技计划(201704030098)资助项目