摘要
结合动态项集计数技术和抽样的思想,利用元学习策略来产生频繁项集,提出了一个不共享内存的分布式关联规则挖掘算法DASM;引进了相似度的概念,并用之提高了挖掘的精确度。理论分析以及在IBM数据生成器生成的数据集上的实验均表明,DASM算法具有较高的挖掘效率和较低的通信量,适用于对效率要求较高的应用领域。
A new distributed association rule mining algorithm of DASM was presented. It adopted the ideas of dynamic itemset counting and sampling, and produced frequent itemsets by meta-learning method. Different sites that applied DASM needn't share the same memory. To assure the completeness of the results, the concept of similar degree was introduced. Theory analysis and experiments on the datasets generated using the generator from the IBM Almaden Quest research group show that DASM has better performance and less communication loads. DASM is applicable to those applications where the efficiency could be more important than accuracy results.
出处
《计算机应用》
CSCD
北大核心
2006年第4期872-874,877,共4页
journal of Computer Applications
基金
河南省自然科学基金资助项目(0211050110)
关键词
抽样
元学习
动态项集计数
相似度
分布式关联规则挖掘
sampling
meta-learning
dynamic itemset counting
similar degree
distributed association rule mining