Intel Knights Corner的结点级内存访问优化被引量：2

Node-level Memory Access Optimization on Intel Knights Corner

下载PDF

导出

摘要传统编程优化(Traditional Programming Optimization,TPO)在Intel Knights Corner(KNC)上收效甚微,因此提出内存访问优化(Memory Access Optimization,MAO)。将MAO应用到已经过TPO的程序Diffusion 3D上,发现其性能仍然提高了39.1%。主要有2个贡献:1)提出MAO,认为TPO+MAO有助于在KNC上获取最优化性能;2)发现对于stencil代码,基于intrinsic的MAO比基于编译器的MAO更高效。这些发现对于在KNC上优化大规模应用有启发意义。 Traditional programming optimization （TPO） has limited effects on Intel Knights Corner （KNC）. Therefore, we proposed memory access optimization （MAO） for KNC. We applied MAO to TPO version of Diffusion 3D, and its performance is improved by 39. 1%. We made two contributions in this paper： 1） MAO is indispensable to KNC and TPOq-MAO is the path to Ninja Performance—the best optimized performance. 2） Intrinsic-based MAO is more effi- cient to stencil code than compiler-based MAO. Our findings on MAO will inspire optimizations of large-scale applica-tions on KNC.

作者林新华李硕赵嘉明松岗聪

机构地区上海交通大学高性能计算中心东京工业大学学术国际情报中心 Intel公司软件与服务部门

出处《计算机科学》 CSCD 北大核心 2015年第11期37-42,共6页 Computer Science

基金国家高技术研究发展计划(863):高性能计算环境应用服务优化关键技术研究日本学术振兴会RONPAKU Fellowship资助

关键词传统编程优化 INTEL Knights CORNER 内存访问优化最优化性能 Traditional programming optimization（TPO）, Intel Knights Corner （KNC）, Memory access optimization（MAO）, Ninja performance

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献16

1Satish N,Kim C,Chhugani J, et al. Can traditional programmingbridge the Ninja performance gap for parallel computing applica-tions. [C] // 2012 39th Annual International Symposium onComputer Architecture CISCA). 2012:440-451.
2Xue W,Yang C,Fu H,et al. Enabling and Scaling a Global Shal-low-Water Atmospheric Model on Tianhe-2 [C] //Proceedings ofthe 2014 IEEE 28th International Parallel and Distributed Pro-cessing Symposium. 2014.
3PennycookSJ, Hughes CJ,Smelyanskiy M, et al. ExploringSIMD for Molecular Dynamics.Using Intel Xeon Processors andIntel Xeon Phi Coprocessors[C] //Proceedings of the 2013 IEEE27th International Symposium on Parallel and Distributed Pro-cessing. 2013:1085-1097.
4Heinecke A,Vaidyanathan K, Smelyanskiy M, et al. Design andImplementation of the Linpack Benchmark for Single and Multi-node Systems Based on Intel Xeon Phi Coprocessor [C] // Pro-ceedings of the 2013 IEEE 27th International Symposium onParallel and Distributed Processing. 2013 : 126-137.
5Krishnaiyer R, Kultursay E,Chawla P,et al. Compiler-BasedData Prefetching and Streaming Non-temporal Store Generationfor the Intel(R) Xeon Phi(TM) Coprocessor[C] // Proceedingsof the 2013 IEEE 27th International Symposium on Parallel andDistributed Processing Workshops and PhD Forum. 2013 : 1575-1586.
6Hofmann J,Treibig J,Hager G,et al. Performance Engineeringfor a Medical Imaging Application on the Intel Xeon Phi Accele-ratorCC]//2014 27th International Conference on Presented atthe Architecture of Computing Systems (ARCS). 2014:1-8.
7Jeffers J, Reinders J. Intel Xeon Phi Coprocessor High Perform-ance Programming (1st edition) [M]. Morgan Kaufmann Pub-lishers Inc,2013.
8Rahman R. Intel Xeon Phi Coprocessor Architecture and Tools:The Guide for Application Developers[M] // Intel Xeon Phi Cop-rocessor Architecture and Tools:The Guide for Application De-velopers(lst edition). 2013.
9Saini S, Jin H, Jespersen D, et al. An early performance evalua-tion of many integrated core architecture based SGI rackablecomputing system[C] // Proceedings of the International Confe-rence on High Performance Computing, Networking, Storageand Analysis. 2013.
10Hofmann J. Performance Evaluation of the Intel ManylntegratedCore Architecture for 3D Image Reconstruction in ComputedTomography (Master Thesis) [M]. Friedrich-Alexander-Univer-sity Erlangen-Nuremberg,2010.

引证文献2

1林新华,王一超,秦强,李硕,文敏华,松岡聡.利用Stencil建模及评估Intel IMCI vgather指令[J].计算机工程与科学,2016,38(9):1741-1747. 被引量：1
2郝赫,司雨蒙,韦建文,文敏华,林新华.天体物理成团研究中的非规则访存优化[J].计算机科学与探索,2017,11(1):80-90. 被引量：1

二级引证文献2

1司雨濛,韦建文,Simon SEE,林新华.星系分组算法的并行设计与优化:SGI系统与分布式集群对比[J].计算机科学,2017,44(10):80-84. 被引量：2
2王一超,廖秋承,左思成,谢锐,林新华.一种ARM处理器面向高性能计算的性能评估[J].计算机科学,2019,46(8):95-99. 被引量：5

1英特尔为超级计算机推出Xeon Phi处理器[J].中国传媒科技,2012(6):69-69.
2英特尔推出新型超级计算机芯片Knights Corner[J].计算机研究与发展,2010,47(7):1200-1200.
3魏文国,谢桂园.高性能分布计算的数据管理与优化[J].计算机工程,2008,34(13):64-66.
4王一超,秦强,施忠伟,林新华.在Intel Knights Corner和NVIDIA Kepler架构上OpenACC的性能可移植性分析[J].计算机科学,2015,42(1):75-78. 被引量：1
5翁杨柳.计算机编程优化的研究[J].信息与电脑（理论版）,2015(20):69-70. 被引量：1
6崔宁.计算机编程方式优化的重要性[J].电子技术与软件工程,2016(23):250-250. 被引量：1
7田驰.基于C语言的计算机编程技术分析[J].电脑编程技巧与维护,2017(2):9-10. 被引量：8
8曾镇东.数学算法对计算机编程优化的分析与研究[J].电脑知识与技术,2016,12(7X):245-246. 被引量：16
9郭建伟.关机操作,亦能精彩纷呈[J].电脑知识与技术（经验技巧）,2017,0(4):28-34.
10显卡做不成，改成CPU好啦[J].电脑爱好者,2010(17):65-65.

计算机科学

2015年第11期

浏览历史

内容加载中请稍等...

Intel Knights Corner的结点级内存访问优化被引量：2

参考文献16

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

Intel Knights Corner的结点级内存访问优化 被引量：2

参考文献16

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

Intel Knights Corner的结点级内存访问优化被引量：2