期刊文献+

NVIDIA Tegra K1异构计算平台访存优化研究 被引量:3

Research on Memory Access Optimization of NVIDIA Tegra K1 Heterogeneous Computing Platform
下载PDF
导出
摘要 在异构计算平台的移植和优化过程中,数字图像处理算法的访存性能已成为制约系统性能的主要因素。为此,结合NVIDIA Tegra K1硬件架构特征和具体算法特性,从合并与向量化访存优化、全局访存bank和channel冲突消除等方面,对矩阵转置算法和拉普拉斯滤波算法在NVIDIA Tegra K1异构计算平台上的实现和访存性能优化进行研究。实验结果表明,采用优化方法后的矩阵转置算法和拉普拉斯滤波算法在NVIDIA Tegra K1异构计算平台上取得了较大的访存性能提升,并且具有较好的实时性。 During the transplantation and optimization of the heterogeneous computing platform, memory access performance of digital image data algorithm becomes the main factor. In order to solve the problem, this paper combines with the NVIDIA Tegra KI hardware architecture' s characteristics and the specific algorithm' s characteristics,reserches the implementation and memory access performance optimization of matrix transpose and Laplace filtering algorithms on the NVIDIA Tegra K1 heterogeneous computing platform from memory access optimization of consolidation and vectorization,eliminating global memory access' s bank and channel conflict etc. Experimental result shows that the performance of matrix transpose and Laplace filtering algorithms on the NVIDIA Tegra K1 heterogeneous computing platform has an obvious improvement, and has good real-time performance.
出处 《计算机工程》 CAS CSCD 北大核心 2016年第12期44-49,共6页 Computer Engineering
基金 国家自然科学基金重大研究计划项目(91420202) 北京市教育委员会科技计划面上项目(SQKM201411417010 KM201511417003)
关键词 GPU优化 访存带宽 数据本地化 向量化 合并访问 拉普拉斯滤波算法 GPU optimization memory access bandwidth data localization vectorization coalesced access Laplace filtering algorithm
  • 相关文献

参考文献7

二级参考文献68

  • 1AllenR,KennedyK现代体系结构的优化编译器[M].张兆庆,乔如良,冯晓兵,等,译.北京:机械工业出版社,2004.
  • 2Haddad R A,Akansu A N.A class of fast Gaussian binomial filters for speech and image processing[J].IEEE Transac- tions on Acoustics, Speech and Signal Processing, 1991,39: 723-727.
  • 3Nixon M S,Aguado A S.Feature extraction and image pro- cessing[M].[S.l.] : Academic Press, 2008 : 88-89.
  • 4NVIDIA Corporation.NVlDIA CUDA programming guide ver- sion 3.2[EB/OL ].[ 2011-O3-27].http : //developer.nvidia.com/cuda.
  • 5Hwu Wen-mei,GPU computing gems[M].New York: Morgan Kaufiuann, 2011.
  • 6NVIDIA Corporation.Boxfilter[EB/OL].[2011-03-27].http :// developer.nvidia.com/gpu-computing-sdk/.
  • 7Chen Wei.High performance median filtering using commodity graphics hardware[C]//Nuclear Science Symposium Confer- ence Record( NSS/MIC ), 2009:4142-4147.
  • 8Podlozhnyuk V.lmage convolution with CUDA[EB/OL]. [2011-03-20].http://developer.nvidia.com/gpu-computing-sdk/.
  • 9NVIDIA Corporation.CUFFT Library documentation[EB/OL]. [2011-03-27].http ://developer.nvidia.com/gpu-computing-sdk/.
  • 10Podlozhnyuk V.FFT-based 2D Convolution[EB/OL].[2011-03-20], http://developer.nvidia.com/gpu-computing-sdk/.

共引文献48

同被引文献18

引证文献3

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部