基于GPGPU-sim的多kernel场景下GPGPU性能优化实验方法

Experimental method for optimizing GPGPU performance in a multiple-kernel environment based on GPGPU-sim

下载PDF

导出

摘要该文介绍了基于GPGPU-sim的多kernel环境下GPGPU性能优化实验方法,旨在为初学者开展多kernenl场景下GPGPU性能优化研究提供实验方法参考,也能为计算机系统结构教学提供案例。文中重点分析讨论了基于GPGPU-sim模拟器、多kernel场景下的一种自适应线程块调度方法的改进思想、实验方法及过程,还对GPGPU的微系统结构、GPGPU-sim模拟器及源代码结构进行了介绍。实验结果表明,该文阐述的实验方法可行,相对于基准方法,该文提出的改进策略可以提升多kernel场景下GPGPU的执行效率。 [Objective]With the rapid development and continuous improvement of the parallel computing architecture of general-purpose graphics processing units(GPGPUs),their computing power has been significantly improved,making them essential in high-performance and high-throughput applications.However,as tasks increase in number and complexity,multi-kernel execution environments face serious challenges.Therefore,optimizing GPGPU performance in multi-kernel environments is crucial.Scholars often use GPGPU-sim as the main tool for studying GPGPU performance optimization methods.Despite this,there is currently no comprehensive guide for conducting GPGPU performance optimization experiments using GPGPU-sim in multi-kernel environments,posing difficulties for beginners in experimental verification and analysis in this area.Furthermore,while the round-robin(RR)scheduling strategy ensures fair resource utilization,it may lead to scheduling delays between multiple kernels in concurrent execution environments.This study aims to provide key experimental methods for beginners to optimize GPGPU performance in multi-kernel concurrent execution environments and offer valuable case references for teaching computer architecture.[Methods]First,the article provides a detailed introduction to the GPGPU architecture and explores the source code structure of the GPGPU-sim simulator,providing readers with relevant background knowledge.It then comprehensively analyzes and discusses the improvement ideas and adaptive thread block(ATB)algorithm of the proposed ATB scheduling strategy.The article elaborates on the process of modifying the GPGPU-sim source code to implement the ATB strategy scheduling of multi-kernel thread block execution.In addition,to ensure that beginners can easily replicate the relevant experiments,the article provides a detailed explanation of the configuration parameters of GPGPU-sim and modifications to the testing program.[Results]This article compares the ATB strategy with the benchmark RR thread block scheduling method,analyzing the experimental results on system performance,shared memory utilization,register utilization,and memory access efficiency.From the perspective of system performance,the ATB strategy enables concurrent execution of multiple kernels,effectively improving resource utilization on the GPGPU,thereby significantly improving the overall execution performance.Compared to RR,ATB's execution efficiency can be improved by up to 76%,with an average system performance improvement of 45%.In terms of shared memory and register utilization,the ATB strategy allows threads from multiple kernels to concurrently access GPGPU resources,improving the utilization of these resources.Shared memory usage under ATB increased by a maximum of 84%,compared to RR,with an average increase of 54%.Register usage saw an average increase of 29%,with a maximum increase of 49%.Regarding memory access efficiency,ATB allows threads from different kernels to access different storage resources,effectively reducing the probability of threads competing for the same resource.Compared to the RR strategy,the pipeline stagnation cycle of ATB decreased by an average of 5%,while the warp waiting data cycle was reduced by a maximum of 44%and an average of 29%.Overall,compared to the benchmark method,the ATB proposed in this paper effectively improves the efficiency of concurrent execution of multiple kernels and GPGPU performance.[Conclusions]This article provides an in-depth analysis and discussion of GPGPU performance optimization methods using GPGPU-sim in an environment including multiple kernels.It successfully designs and implements an ATB scheduling strategy.By adopting an improved ATB scheduling strategy in the GPGPU-sim simulator,the study successfully achieved concurrent execution of multiple kernels and verified the effectiveness of this strategy in improving GPGPU performance through experimental data.This work not only provides detailed and feasible experimental methods for beginners but also offers important reference cases for teaching computer architecture.

作者张军魏继桢沈凡凡谭海何炎祥 ZHANG Jun;WEI Jizhen;SHEN Fanfan;TAN Hai;HE Yanxiang(School of Information Engineering,East China University of Technology,Nanchang 330013,China;School of Information Engineering,Nanjing Audit University,Nanjing 211815,China;Computer School,Wuhan University,Wuhan 430072,China)

机构地区东华理工大学信息工程学院南京审计大学信息工程学院武汉大学计算机学院

出处《实验技术与管理》 CAS 北大核心 2024年第7期87-93,共7页 Experimental Technology and Management

基金国家自然科学基金项目(62162002,61662002,61902189) 江西省自然科学基金项目(20212BAB202002) 江苏省高等学校基础科学(自然科学)研究项目(22KJA520004)。

关键词多kernel场境 GPGPU GPGPU-sim 性能优化 multiple-kernel scenario GPGPU GPGPU-sim performance optimization

分类号 TP303 [自动化与计算机技术—计算机系统结构] TP333 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献2

1ZHANG Jun,HE Yanxiang,SHEN Fanfan,LI Qing'an,TAN Hai.Memory Request Priority Based Warp Scheduling for GPUs[J].Chinese Journal of Electronics,2018,27(5):985-994. 被引量：1
2谢伙生,林晶,陆泽萍.基于可编程GPU的光照模型实验案例设计[J].实验技术与管理,2019,36(11):128-132. 被引量：4

二级参考文献9

1黄晓生,曹义亲.多模态学习理论在“计算机图形学”实验教学中的应用[J].实验技术与管理,2012,29(4):162-165. 被引量：9
2杜鹏,赵杰伊,潘万彬,王毅刚.GPU Accelerated Real-Time Collision Handling in Virtual Disassembly[J].Journal of Computer Science & Technology,2015,30(3):511-518. 被引量：7
3李丹,袁凌,胡迎松,朱玲玲.面向游戏开发的计算机图形学立体化实践教学框架研究[J].实验技术与管理,2015,32(7):202-205. 被引量：5
4宋春霖,杨金龙,袁运浩.计算机图形学教学改革与探讨[J].教育教学论坛,2015(49):152-153. 被引量：4
5刘永进.中国计算机图形学研究进展[J].科技导报,2016,34(14):76-85. 被引量：4
6何炎祥,张军,沈凡凡,江南,李清安,刘子骏.通用图形处理器线程调度优化方法研究综述[J].计算机学报,2016,39(9):1733-1749. 被引量：4
7高雪瑶,张春祥.基于翻转课堂的计算机图形学教学模式研究[J].计算机教育,2017(1):113-116. 被引量：10
8赵君峤,王小平,李光耀,臧笛.面向国际工程教育认证的计算机图形学课程设计及其中外案例分析[J].计算机应用与软件,2017,34(10):143-148. 被引量：5
9赵明.计算机图形学“MOOC+翻转课堂”教学实践及效果[J].高教探索,2016(S1):54-55. 被引量：17

共引文献3

1刘睿,周军,刘莹莹,郭建国.案例式“航天器大角度机动”全物理仿真实验设计与实践[J].实验技术与管理,2021,38(4):210-213. 被引量：2
2李童,杨楠.新工科背景下学生友好型案例教学的理念、构建与实践[J].高等工程教育研究,2022,70(1):29-34. 被引量：46
3陈靖中,陈恒,罗立宏.虚拟数字人实时卡通渲染研究[J].电脑编程技巧与维护,2023(7):151-153.

1李逵,邢睿思,匡乃亮,杨宇军,王超,张跃平,高武.一种用于微系统微焊点层的改进等效力学参数计算方法[J].微电子学与计算机,2023,40(11):112-120.
2刘明泽,卫晓利,张发兴,沙马农花.有机硅改性复合软段水性聚氨酯的制备及性能研究[J].中国胶粘剂,2024,33(6):35-40.
3应丹平.小学度量单位内容结构化教学的实践研究[J].小学教学参考,2024(20):52-55.
4巨晓山.让孩子爱上学习:小学游戏化教学的多学科探索[J].中国基础教育,2024(7):25-27.
5张一兵,孔伟宇.市民社会与劳动异化:重新回到马克思——《回到马克思》第二卷出版之际(访谈)[J].学术界,2024(7):15-25.
6张一兵.元哲学:走向使用的社会空间理论——列斐伏尔《空间的生产》解读[J].学术界,2024(7):5-14.
7杨凯,唐小林,钟桂川,王明,李国法,胡晓松.面向无信号灯十字路口场景的自动驾驶安全决策方法研究[J].机械工程学报,2024,60(10):147-159.

实验技术与管理

2024年第7期

浏览历史

内容加载中请稍等...

基于GPGPU-sim的多kernel场景下GPGPU性能优化实验方法

参考文献2

二级参考文献9

共引文献3

相关作者

相关机构

相关主题

浏览历史