摘要
针对已有的社团发现算法存在时间复杂度较高、运行过程会产生大量重复团等问题,引入二叉树的存储结构、权重排序、深度优先遍历的概念,与Spark基于内存计算的特点相结合,提出一种改进的并行化S-T-CS算法。通过搭建Spark大数据平台实现该算法,并与传统团搜索CS算法和基于Hadoop的MR-T-CS算法进行性能对比。实验结果表明,S-T-CS算法解决了生成结果冗余的问题,降低了时间代价,提升了社团发现算法的运行速度和对海量数据的处理能力。
Aiming at the existing community discovery algorithm had the problems of higher time complexity,operation process produced a lot of repeate group,etc. This paper introduced the concept of two binary tree storage structure,weights sorting,depth first traversal,combined with the characteristic of Spark based on memory calculation,proposed an improved parallel S-T-CS algorithm. It built Spark big data platform to implement the algorithm,and compared with the traditional group search CS algorithm and based on Hadoop MR-T-CS algorithm. Experimental results show that the S-T-CS algorithm solves the problem of redundant results,reduces time cost,improves the speed of the community discovery algorithm and the processing capacity of massive data.
作者
王永贵
徐山珊
肖成龙
Wang Yonggui;Xu Shanshan;Xiao Chenglong(Cbllege of Software,Liaoning Technical University,Huludao Liaoning 125105,China)
出处
《计算机应用研究》
CSCD
北大核心
2018年第12期3648-3651,3681,共5页
Application Research of Computers
基金
国家自然科学青年基金资助项目(61404069)