摘要
频繁模式的并行挖掘算法是数据挖掘中重要的研究课题。目前已经提出的并行算法大多是基于Apriori或基于FP-tree。由于两者的固有局限性,而且在计算过程中需要多次同步,因而具有较低的性能。文章提出了一种基于分布数据库的并行挖掘算法。该算法尽可能地让每个处理器独立地挖掘,每个处理器基于前缀树采用深度优先搜索的策略挖掘局部频繁模式集,并通过相关性质尽量减少候选全局频繁模式的规模,减少网络的通信量和同步次数以提高挖掘效率。
Parallel mining frequent patterns is a key problem in data mining research.There are many parallel algorithms based on Apriori or FP-tree,which have some inherent drawbacks and require many synchronization steps. So, they achieve poor performances.Therefore,parallel mining algorithm PMFP in distributed database is proposed.PMFP attempts to make each processor to do independently and decrease the number of candidate of global frequent patterns according to the relation between local frequent pattern and global frequent pattern.Therefore,the algorithm uses far less communication overhead and fewer synchronization steps,improves efficiency of mining global frequent patterns.
出处
《计算机工程与应用》
CSCD
北大核心
2005年第25期1-3,22,共4页
Computer Engineering and Applications
基金
国家自然科学基金项目(编号:60273075)
关键词
频繁模式
并行算法
前缀树
全局频繁模式
frequent patterns,parallel algorithm,prefix tree,global frequent pattern