摘要
现有的图挖掘算法在云环境下难以有效地进行大规模图形的高频模式挖掘。为此,对Spider Mine算法做了改进,提出一种基于云的Spider Mine算法(c-Spider Mine)。首先,利用最小切割算法将大规模图形数据分为多个子图,使分区/融合成本最小,然后,利用Spider Mine进行模式挖掘,显著降低了大型模式生成时的组合复杂度。最后,采用一种模式键函数来保存模式,以保证所有模式可被成功恢复和融合。基于3种真实数据集的仿真实验结果表明,c-Spider Mine可高效挖掘云环境下的前K个大型模式,在不同数据规模和最小支持设置条件下,c-Spider Mine在内存使用和运行时间方面的性能均优于Spider Mine。
The existing graph mining algorithms in a cloud environment are difficult to carry out mining the high frequent patterns of a massive graph.To solve this problem,this paper has made the improvement to the Spider Mine algorithm,and an improved Spider Mine algorithm is proposed based on the cloud(c-Spider Mine).Firstly,one big graph data is divided into several sub graphs by minimum cut algorithm to minimize partition/merge costs.And then it exploits Spider Mine to mine the patterns,which generates large patterns with much lower combinational complexity.Finally,a pattern key(PK) function is proposed to preserve the patterns,which guarantees that all patterns can be successfully recovered and merged.The experiments are conducted with three real data sets,and the experimental results demonstrate that c-Spider Mine can efficiently mine top-k large patterns in the cloud,and performs well in memory usage and execution time with different data sizes and minimum supports than the Spider Mine.
出处
《微型电脑应用》
2016年第1期33-37,共5页
Microcomputer Applications
基金
合肥学院校级基金(14KY12ZR)
关键词
图挖掘
云计算
高频模式
最小切割算法
模式键函数
运行时间
Graph Mining
Cloud Computing
Frequent Patterns
Minimum Cut Algorithm
Pattern Key Function
Execution Time