期刊文献+

Improving vertex-frontier based GPU breadth-first search

Improving vertex-frontier based GPU breadth-first search
下载PDF
导出
摘要 Breadth-first search(BFS) is an important kernel for graph traversal and has been used by many graph processing applications. Extensive studies have been devoted in boosting the performance of BFS. As the most effective solution, GPU-acceleration achieves the state-of-the-art result of 3.3×109 traversed edges per second on a NVIDIA Tesla C2050 GPU. A novel vertex frontier based GPU BFS algorithm is proposed, and its main features are three-fold. Firstly, to obtain a better workload balance for irregular graphs, a virtual-queue task decomposition and mapping strategy is introduced for vertex frontier expanding. Secondly, a global deduplicate detection scheme is proposed to remove reduplicative vertices from vertex frontier effectively. Finally, a GPU-based bottom-up BFS approach is employed to process large frontier. The experimental results demonstrate that the algorithm can achieve 10% improvement over the state-of-the-art method on diverse graphs. Especially, it exhibits 2-3 times speedup on low-diameter and scale-free graphs over the state-of-the-art on a NVIDIA Tesla K20 c GPU, reaching a peak traversal rate of 11.2×109 edges/s. Breadth-first search(BFS) is an important kernel for graph traversal and has been used by many graph processing applications. Extensive studies have been devoted in boosting the performance of BFS. As the most effective solution, GPU-acceleration achieves the state-of-the-art result of 3.3×10^9 traversed edges per second on a NVIDIA Tesla C2050 GPU. A novel vertex frontier based GPU BFS algorithm is proposed, and its main features are three-fold. Firstly, to obtain a better workload balance for irregular graphs, a virtual-queue task decomposition and mapping strategy is introduced for vertex frontier expanding. Secondly, a global deduplicate detection scheme is proposed to remove reduplicative vertices from vertex frontier effectively. Finally, a GPU-based bottom-up BFS approach is employed to process large frontier. The experimental results demonstrate that the algorithm can achieve 10% improvement over the state-of-the-art method on diverse graphs. Especially, it exhibits 2-3 times speedup on low-diameter and scale-free graphs over the state-of-the-art on a NVIDIA Tesla K20 c GPU, reaching a peak traversal rate of 11.2×10^9 edges/s.
出处 《Journal of Central South University》 SCIE EI CAS 2014年第10期3828-3836,共9页 中南大学学报(英文版)
基金 Projects(61272142,61103082,61003075,61170261,61103193)supported by the National Natural Science Foundation of China Project supported by the Program for New Century Excellent Talents in University of China Projects(2012AA01A301,2012AA010901)supported by the National High Technology Research and Development Program of China
关键词 breadth-first search GPU graph traversal vertex frontier 广度优先搜索 GPU 顶点 NVIDIA Tesla 图形处理 BFS 负载平衡
  • 相关文献

参考文献24

  • 1ZERBINO D R, VELVET B E. Algorithms for de Novo short read assembly using de Bruijn graphs [J]. Genome Research, 2008, 18(5): 821-829.
  • 2BAKOS J D. High-performance heterogeneous computing with the convey HC-1 [J1. Computing in Science & Engineering, 2010, 12(6): 80-87.
  • 3MALEWICZ G, AUSTERN M H, BIK A J C, DEHNERT J C, HORN I, LEISER N, PREGEL C (i A system for large-scale graph processing [C]// Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. USA: ACM Press, 2010: 135-146.
  • 4KWAK H, LEE C, PARK H, MOON S. What is twitter, a social network or a news media [C]// Proceedings of the 19th International Conference on World Wide Web. USA: ACM Press, 2010: 591-600.
  • 5STRATTON J A, RODRIGUES C, SUNG I J, OBEID N, CHANG L W, ANSSARI N, LIU G D, HWU W M W. Parboil: A revised benchmark suite for scientific and commercial throughput computing [R]. Illinois, Urbana: Center for Reliable and High-Performance Computing, 2012.
  • 6Graph 500 Steering Committee. The Graph 500 List [EB/OL]. [2013 -08-15]. http://www.graph 500.org/.
  • 7AGARWAL V, PETRINI F, PASETTO D, BADER D A. Scalable graph exploration on multicore processors [C]//Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. USA: IEEE Computer Society, 2010: 1-1 1.
  • 8GAO T, LU Y, ZHANG B, SUO K. Using MIC to accelerate a typical data-intensive application: The breadth-first search [C]// Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International. USA: IEEE Computer Society, 2013:1117-1125.
  • 9HONG S, KIM S K, OGUNTEB1 T, OLUKOTUN K. Accelerating CUDA graph algorithms at maximum warp [C]// Proceedings of the 16th ACM Symposium on Principles and Practice of ParallelProgramming. USA: ACM Press, 2011 : 267- 276.
  • 10ZOU D, DOU Y, GUO S, NI S. High performance sparse matrix-vector multiplication on FPGA [J]. IEICE Electronics Express, 2013, 10(17): 20130529.

二级参考文献3

共引文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部