摘要
频繁子图挖掘是数据挖掘领域的一个重要问题,并且有着广泛的应用。在Hadoop平台上实现了一种基于MapReduce的高效频繁子图挖掘算法Cloud-GFSG(cloud-global frequent subgraph)。该算法基于Apriori思想,在扩展边生成新的子图时,使用已经挖掘出的k-1阶的频繁子图生成k阶的频繁子图。同时,检查是否存在待扩展生成的子图,设定生成的频繁子图表示规则,保证了频繁子图信息的唯一性。较同类算法相比,该算法在挖掘频繁子图时更具通用性,并且在扩展边时避免产生大量的复制图,从而使得算法的正确性得以保证,且运行效率显著提高。
Frequent subgraph mining is an important problem in data mining domain and has been used widely. This paper proposes an efficient algorithm Cloud-GFSG (cloud-global frequent subgraph), by using MapReduce on Hadoop platform for mining frequent subgraphs. The algorithm is based on the principle of Apriori. It uses the discovered frequent subgraphs whose support is k-1 to generate the candidate frequent subgraphs whose support is k when it generates new subgraphs by extending edge. Meanwhile, it checks whether there exists any subgraph which would be gener-ated and sets the frequent subgraph generation rules to ensure the uniqueness of the frequent subgraphs. Compared with the state-of-the-art algorithms, the proposed algorithm has more general function and can avoid generating replicate graphs while extending a new edge. Therefore, its correctness can be ensured and the efficiency had been improved significantly.
出处
《计算机科学与探索》
CSCD
2014年第7期790-801,共12页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金(61173093
61202182)
陕西省自然科学基金(2013JM8019
2014JQ8359)
中国博士后科学基金(2012M521776)
中央高校基本科研业务费专项资金(K50510230001
BDY10)~~