Improving vertex-frontier based GPU breadth-first search

Improving vertex-frontier based GPU breadth-first search

下载PDF

导出

摘要 Breadth-first search(BFS) is an important kernel for graph traversal and has been used by many graph processing applications. Extensive studies have been devoted in boosting the performance of BFS. As the most effective solution, GPU-acceleration achieves the state-of-the-art result of 3.3×109 traversed edges per second on a NVIDIA Tesla C2050 GPU. A novel vertex frontier based GPU BFS algorithm is proposed, and its main features are three-fold. Firstly, to obtain a better workload balance for irregular graphs, a virtual-queue task decomposition and mapping strategy is introduced for vertex frontier expanding. Secondly, a global deduplicate detection scheme is proposed to remove reduplicative vertices from vertex frontier effectively. Finally, a GPU-based bottom-up BFS approach is employed to process large frontier. The experimental results demonstrate that the algorithm can achieve 10% improvement over the state-of-the-art method on diverse graphs. Especially, it exhibits 2-3 times speedup on low-diameter and scale-free graphs over the state-of-the-art on a NVIDIA Tesla K20 c GPU, reaching a peak traversal rate of 11.2×109 edges/s. Breadth-first search（BFS） is an important kernel for graph traversal and has been used by many graph processing applications. Extensive studies have been devoted in boosting the performance of BFS. As the most effective solution, GPU-acceleration achieves the state-of-the-art result of 3.3×10^9 traversed edges per second on a NVIDIA Tesla C2050 GPU. A novel vertex frontier based GPU BFS algorithm is proposed, and its main features are three-fold. Firstly, to obtain a better workload balance for irregular graphs, a virtual-queue task decomposition and mapping strategy is introduced for vertex frontier expanding. Secondly, a global deduplicate detection scheme is proposed to remove reduplicative vertices from vertex frontier effectively. Finally, a GPU-based bottom-up BFS approach is employed to process large frontier. The experimental results demonstrate that the algorithm can achieve 10% improvement over the state-of-the-art method on diverse graphs. Especially, it exhibits 2-3 times speedup on low-diameter and scale-free graphs over the state-of-the-art on a NVIDIA Tesla K20 c GPU, reaching a peak traversal rate of 11.2×10^9 edges/s.

作者杨博卢凯高颖慧徐凯王小平程志权

机构地区 Science and Technology on Parallel and Distributed Processing Laboratory College of Computer Department of Electronic Science and Engineering Avatar Science Company

出处《Journal of Central South University》 SCIE EI CAS 2014年第10期3828-3836,共9页 中南大学学报（英文版）

基金 Projects(61272142,61103082,61003075,61170261,61103193)supported by the National Natural Science Foundation of China Project supported by the Program for New Century Excellent Talents in University of China Projects(2012AA01A301,2012AA010901)supported by the National High Technology Research and Development Program of China

关键词 breadth-first search GPU graph traversal vertex frontier 广度优先搜索 GPU 顶点 NVIDIA Tesla 图形处理 BFS 负载平衡

分类号 TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献24

1ZERBINO D R, VELVET B E. Algorithms for de Novo short read assembly using de Bruijn graphs [J]. Genome Research, 2008, 18(5): 821-829.
2BAKOS J D. High-performance heterogeneous computing with the convey HC-1 [J1. Computing in Science & Engineering, 2010, 12(6): 80-87.
3MALEWICZ G, AUSTERN M H, BIK A J C, DEHNERT J C, HORN I, LEISER N, PREGEL C (i A system for large-scale graph processing [C]// Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. USA: ACM Press, 2010: 135-146.
4KWAK H, LEE C, PARK H, MOON S. What is twitter, a social network or a news media [C]// Proceedings of the 19th International Conference on World Wide Web. USA: ACM Press, 2010: 591-600.
5STRATTON J A, RODRIGUES C, SUNG I J, OBEID N, CHANG L W, ANSSARI N, LIU G D, HWU W M W. Parboil: A revised benchmark suite for scientific and commercial throughput computing [R]. Illinois, Urbana: Center for Reliable and High-Performance Computing, 2012.
6Graph 500 Steering Committee. The Graph 500 List [EB/OL]. [2013 -08-15]. http://www.graph 500.org/.
7AGARWAL V, PETRINI F, PASETTO D, BADER D A. Scalable graph exploration on multicore processors [C]//Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. USA: IEEE Computer Society, 2010: 1-1 1.
8GAO T, LU Y, ZHANG B, SUO K. Using MIC to accelerate a typical data-intensive application: The breadth-first search [C]// Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International. USA: IEEE Computer Society, 2013:1117-1125.
9HONG S, KIM S K, OGUNTEB1 T, OLUKOTUN K. Accelerating CUDA graph algorithms at maximum warp [C]// Proceedings of the 16th ACM Symposium on Principles and Practice of ParallelProgramming. USA: ACM Press, 2011 : 267- 276.
10ZOU D, DOU Y, GUO S, NI S. High performance sparse matrix-vector multiplication on FPGA [J]. IEICE Electronics Express, 2013, 10(17): 20130529.

二级参考文献3

1陈民,盛政明,郑君,马燕云,张杰.粒子模拟程序的发展及其在激光等离子体相互作用研究中的应用[J].计算物理,2008,25(1):43-50. 被引量：11
2莫则尧,许林宝,张宝琳,沈隆钧.二维等离子体模拟粒子云网格方法的并行计算与性能分析[J].计算物理,1999,16(5):496-504. 被引量：9
3ZHENG Chun-Yang,ZHU Shao-ping,HE Xian-Tu.Quasistatic Magnetic Field Generation by an Intense Ultrashort Laser Pulse in Underdense Plasma[J].Chinese Physics Letters,2000,17(10):746-748. 被引量：1

共引文献1

1姚文科,杜云飞,吴强,杨灿群.基于Intel Xeon Phi的激光等离子体粒子模拟研究[J].计算机工程与科学,2014,36(5):809-813. 被引量：1

1贾建强,陈卫东,席裕庚.开放式自主移动机器人系统设计与控制实现[J].上海交通大学学报,2005,39(6):905-909. 被引量：6
2杜恒,龚茜茹.图的深度优先遍历的C语言实现[J].九江职业技术学院学报,2004(2):26-28. 被引量：2
3潘东静,宁玉富,刘建军.图遍历的演示[J].德州高专学报,2000,16(4):21-23.
4叶楠,郝子宇,郑方,谢向辉.BFS算法与众核处理器的适应性研究[J].计算机研究与发展,2015,52(5):1187-1197. 被引量：7
5e frontier发布适应于Adobe Photoshop CS3扩展版的[J].桌面黄页,2007(5):15-15.
6马海瑛.数据结构中递归算法的描述与实现[J].大众科技,2007,9(9):177-178. 被引量：3
7陈福,杨家海,杨扬,熊曾刚.P2P／Web Service与网格资源发现服务研究[J].计算机科学,2008,35(4):16-19. 被引量：1
8任苙萍.Frontier Silicon抢先布局T-DMB有成[J].电子与电脑,2006(9):32-33.
9陈永恒,左祥麟.基于多核环境的并行性双向枚举连接[J].吉林大学学报（理学版）,2014,52(1):59-64.
10ST展示最新的传感器和功率半导体解决方案[J].微型机与应用,2013,32(14):89-89. 被引量：2

Journal of Central South University

2014年第10期

浏览历史

内容加载中请稍等...

Improving vertex-frontier based GPU breadth-first search

参考文献24

二级参考文献3

共引文献1

相关作者

相关机构

相关主题

浏览历史