摘要
在异构计算平台的移植和优化过程中,数字图像处理算法的访存性能已成为制约系统性能的主要因素。为此,结合NVIDIA Tegra K1硬件架构特征和具体算法特性,从合并与向量化访存优化、全局访存bank和channel冲突消除等方面,对矩阵转置算法和拉普拉斯滤波算法在NVIDIA Tegra K1异构计算平台上的实现和访存性能优化进行研究。实验结果表明,采用优化方法后的矩阵转置算法和拉普拉斯滤波算法在NVIDIA Tegra K1异构计算平台上取得了较大的访存性能提升,并且具有较好的实时性。
During the transplantation and optimization of the heterogeneous computing platform, memory access performance of digital image data algorithm becomes the main factor. In order to solve the problem, this paper combines with the NVIDIA Tegra KI hardware architecture' s characteristics and the specific algorithm' s characteristics,reserches the implementation and memory access performance optimization of matrix transpose and Laplace filtering algorithms on the NVIDIA Tegra K1 heterogeneous computing platform from memory access optimization of consolidation and vectorization,eliminating global memory access' s bank and channel conflict etc. Experimental result shows that the performance of matrix transpose and Laplace filtering algorithms on the NVIDIA Tegra K1 heterogeneous computing platform has an obvious improvement, and has good real-time performance.
出处
《计算机工程》
CAS
CSCD
北大核心
2016年第12期44-49,共6页
Computer Engineering
基金
国家自然科学基金重大研究计划项目(91420202)
北京市教育委员会科技计划面上项目(SQKM201411417010
KM201511417003)
关键词
GPU优化
访存带宽
数据本地化
向量化
合并访问
拉普拉斯滤波算法
GPU optimization
memory access bandwidth
data localization
vectorization
coalesced access
Laplace filtering algorithm