摘要
针对最大频繁项目集挖掘算法(DMFIA)当候选项目集维数高而最大频繁项目集维数较低的情况下要产生大量的候选项目集的缺点,提出了一种改进的基于频繁模式树(FP-tree)结构的最大频繁项目集挖掘算法——FPMFIA。该算法根据FP-tree的项目头表,采用自底向上的搜索策略逐层挖掘最大频繁项目集,从而加速每次对候选集计数的操作。在挖掘时根据每层的条件模式基产生维数较低的非频繁项目集,尽早对候选项目集进行剪枝和降维,可大量减少候选项目集的数量。同时在挖掘时充分利用最大频繁项集的性质,减少搜索空间。通过算法在不同支持度下挖掘时间的对比可知,算法FP-MFIA在最小支持度较低的情况下时间效率是DMFIA以及基于降维的最大频繁模式挖掘算法(BDRFI)的2倍以上,说明FP-MFIA在候选集维数较高的时候优势明显。
Focusing on the drawback that Discovering Maximum Frequent Itemsets Algorithm (DMFIA) has to generate lots of maximum frequent candidate itemsets in each dimension when given datasets with many candidate items and each maximum frequent itemset is not long, an improved Algorithm for mining Maximum Frequent Itemsets based of Frequent- Pattern tree (FP-MFIA) for mining maximum frequent itemsets based on FP-tree was proposed. According to Htable of FP- tree, this algorithm used bottom-up searches to mine maximum frequent itemsets, thus accelerated the count of candidates. Producing infrequent itemsets with lower dimension according to conditional pattern base of every layer when mining, cutting and reducing dimensions of candidate itemsets can largely reduce the amount of candidate itemsets. At the same time taking full advantage of properties of maximum frequent itemsets will reduce the search space. The time efficiency of FP-MFIA is at least two times as much as the algorithm of DMFIA and BDRFI ( algorithm for mining frequent itemsets based on dimensionality reduction of frequent itemset) according to computational time contrast based on different supports. It shows that FP-MFIA has a clear advantage when candidate itemsets are with high dimension.
出处
《计算机应用》
CSCD
北大核心
2015年第3期775-778,共4页
journal of Computer Applications
基金
国家科技支撑计划项目(2012BAF12B08)
关键词
最大频繁项集
频繁模式树
数据挖掘
关联规则
非频繁项集
maximum frequent itemset
Frequent Pattern tree (FP-tree)
data mining
association rule
infrequentitemset