摘要
为了解决大数据环境下如何高效地挖掘关联规则并进行增量更新,在原有的fast updating pruning(FUP)算法基础上,首先提出一种基于矩阵的关联规则增量更新方法(MFUP),该方法将数据集转化成布尔矩阵,减少对数据集的扫描次数以及数据集的存储量;然后将MFUP与Hadoop分布式计算框架结合,提出一种分布式环境下的新算法Cloud MFUP(CMFUP);最后通过设计实验进行对比分析。结果表明,在挖掘相同数据量的关联规则并进行增量更新时,MFUP算法相比FUP算法执行时间更少,且随着数据集的增加,其增速更慢;对比CMFUP与MRFUP算法表明,随着分布式环境下数据集的增加,前者较后者执行时间更短增速更慢。
In an attempt to efficiently mine association rules and update increments for the case of big data,we first discuss a series of improved algorithms based on the fast updating pruning( FUP) algorithm,and then propose a new matrix FUP( MFUP) algorithm based on association rules and incremental updating of matrices.The proposed method reduces the scan times of datasets by transforming the datasets to a Boolean matrix,and the storage space required is also decreased by using the Boolean matrix.An experimental study of incremental updating of frequent items verified that the time required by the MFUP algorithm is less than that for the FUP algorithm when mining association rules and updating increments for the same amount of data.In addition,as the number of datasets increases,the rate of increase of the time required is slower in the case of the MFUP algorithm.A second experiment indicated that the time required by the two algorithms decreased as the support degree increased.Furthermore,by introducing the Hadoop platform into the MFUP algorithm when updating the matrix of incremental datasets,an improved cloud MFUP( CMFUP) algorithm based on a distributed computing environment has been proposed.When increasing the number of datasets in the distributed computing environment,the time required by the CMFUP algorithm is less than that of the map reduce FUP( MRFUP) algorithm,and the rate of increase of the time required is also slower.In addition,as the number of cluster datanodes increases,the time required decreases.
作者
耿志强
张杨
韩永明
GENG ZhiQiang ZHANG Yang HAN YongMing(College of Information Science and Technology Engineering Research Center of Intelligent PSE, Ministry of Education, Beijing University of Chemical Technology, Beijing 100029, China)
出处
《北京化工大学学报(自然科学版)》
CAS
CSCD
北大核心
2016年第5期89-94,共6页
Journal of Beijing University of Chemical Technology(Natural Science Edition)
基金
国家自然基金(61374166)
北京市自然科学基金(4162045)
教育部博士点基金(20120010110010)
中央高校基本科研业务费(JD1502)