摘要
为设计基于固定序的Bellman-Ford算法在CUDA平台下并行优化方案,结合算法计算密集和数据密集的特点。从核函数计算层面,提出访存优化方法和基于固定序优化线程发散;从CPU-GPU传输层面,提出基于CUDA流优化数据传输开销方法。对不同显卡进行测试,参照共享内存容量划分线程块、缩减迭代后向量维度并使用CUDA流缩短首次计算时延,相比传统算法,改进后并行算法加速比在200倍左右。该并行优化方案验证了固定序在CUDA平台具有可行性和可移植性,可作为多平台研究参照。
To design a parallel optimization scheme based on the fixed-order Bellman-Ford algorithm on the CUDA platform,the algorithm was computationally intensive and data-intensive.From the computational level of kernel function,the memory access optimization method and the fixed-order optimization thread divergence were proposed.From the CPU-GPU transmission level,the data transmission overhead method based on CUDA stream was proposed.After testing different graphics cards,the thread block was divided with reference to the shared memory capacity,the vector dimension was reduced after iteration,and the first calculation delay was shortened using the CUDA stream.The improved parallel algorithm has an acceleration ratio of about 200 times compared with the conventional algorithm.The parallel optimization scheme verifies that the fixed order is feasible and portable on the CUDA platform and can be used as a reference for multi-platform research.
作者
张晗
钱育蓉
王跃飞
陈人和
田宸玮
ZHANG Han;QIAN Yu-rong;WANG Yue-fei;CHEN Ren-he;TIAN Chen-wei(School of Software,Xinjiang University,Urumqi 830008,China)
出处
《计算机工程与设计》
北大核心
2019年第8期2181-2189,共9页
Computer Engineering and Design
基金
国家自然科学基金项目(61562086、61462079)
新疆维吾尔自治区创新团队基金项目(XJEDU2017T002)