摘要
图的稀疏化是图聚类分析中数据预处理的关键操作,已得到广泛的关注。针对图数据日益普及、规模不断增大的现状,提出了一种基于MapReduce的面向大规模图的稀疏化算法,即MR-GSpar算法。该算法在MapReduce并行计算框架的基础上,通过对传统的最小哈希(Minhash)算法的并行化改造,使其可在分布式的集群环境中实现对大规模图数据的高效稀疏化处理。真实数据集上的实验表明了该算法的可行性与有效性。
As an important data pre-processing operation, graph sparsification has attracted wide attentions from the ar-ea of database. Nowadays the graph data is becoming popular and scale. Thus this paper proposed an efficient parallel graph sparsification algorithm, namely the MR-GSpar algorithm. The MR-GSpar algorithm is presented by reforming the traditional Minhash algorithm into a parallel and distributed algorithm using MapReduce framework,which can ar- chive efficient sparsification on large-scale graph data in a large machine cluster environment. Experiments on real data- sets show that the algorithm is feasible and effective.
出处
《计算机科学》
CSCD
北大核心
2013年第10期190-193,212,共5页
Computer Science
基金
国家自然科学基金项目(61070031
61070032)资助