Linpack是业内公认的用于衡量高性能计算集群实际计算性能的基准测试程序,对高性能计算集群进行Linpack测试不仅能了解集群的实际运算能力,还可以通过测试分析出集群的性能瓶颈,从而对其进行针对性的改进。由于在Linpack测试过程中不同...Linpack是业内公认的用于衡量高性能计算集群实际计算性能的基准测试程序,对高性能计算集群进行Linpack测试不仅能了解集群的实际运算能力,还可以通过测试分析出集群的性能瓶颈,从而对其进行针对性的改进。由于在Linpack测试过程中不同输入参数对实测的算力峰值影响非常大,参数选择十分费力,以往都是凭借经验配置参数,不断尝试以获取较满意的测试结果。笔者以华中科技大学高性能计算集群为测试对象,采用HPL(Hero Pro League)软件进行Linpack测试,通过对Linpack测试原理进行深入分析,提出一套标准测试方案。通过此方案步骤能够大大减少盲目测试次数,快速获得满意的性能效率。展开更多
In this paper we present the programming of the Linpack benchmark on TianHe-1 system,the first petascale supercomputer system of China,and the largest GPU-accelerated heterogeneous system ever attempted before.A hybri...In this paper we present the programming of the Linpack benchmark on TianHe-1 system,the first petascale supercomputer system of China,and the largest GPU-accelerated heterogeneous system ever attempted before.A hybrid programming model consisting of MPI,OpenMP and streaming computing is described to explore the task parallel,thread parallel and data parallel of the Linpack.We explain how we optimized the load distribution across the CPUs and GPUs using the two-level adaptive method and describe the implementation in details.To overcome the low-bandwidth between the CPU and GPU communication,we present a software pipelining technique to hide the communication overhead.Combined with other traditional optimizations,the Linpack we developed achieved 196.7 GFLOPS on a single compute element of TianHe-1.This result is 70.1% of the peak compute capability,3.3 times faster than the result by using the vendor's library.On the full configuration of TianHe-1 our optimizations resulted in a Linpack performance of 0.563 PFLOPS,which made TianHe-1 the 5th fastest supercomputer on the Top500 list in November,2009.展开更多
文摘Linpack是业内公认的用于衡量高性能计算集群实际计算性能的基准测试程序,对高性能计算集群进行Linpack测试不仅能了解集群的实际运算能力,还可以通过测试分析出集群的性能瓶颈,从而对其进行针对性的改进。由于在Linpack测试过程中不同输入参数对实测的算力峰值影响非常大,参数选择十分费力,以往都是凭借经验配置参数,不断尝试以获取较满意的测试结果。笔者以华中科技大学高性能计算集群为测试对象,采用HPL(Hero Pro League)软件进行Linpack测试,通过对Linpack测试原理进行深入分析,提出一套标准测试方案。通过此方案步骤能够大大减少盲目测试次数,快速获得满意的性能效率。
基金Supported by the National High Technology Research and Development 863 Program of China under Grant No.2009AA01A128the Major Science and Technology Project of China under Grant No.2009ZX01036-001-003-001the National Natural Science Foundation of China under Grant Nos.61003087,60903044,60903059,60970033,and60673150
文摘In this paper we present the programming of the Linpack benchmark on TianHe-1 system,the first petascale supercomputer system of China,and the largest GPU-accelerated heterogeneous system ever attempted before.A hybrid programming model consisting of MPI,OpenMP and streaming computing is described to explore the task parallel,thread parallel and data parallel of the Linpack.We explain how we optimized the load distribution across the CPUs and GPUs using the two-level adaptive method and describe the implementation in details.To overcome the low-bandwidth between the CPU and GPU communication,we present a software pipelining technique to hide the communication overhead.Combined with other traditional optimizations,the Linpack we developed achieved 196.7 GFLOPS on a single compute element of TianHe-1.This result is 70.1% of the peak compute capability,3.3 times faster than the result by using the vendor's library.On the full configuration of TianHe-1 our optimizations resulted in a Linpack performance of 0.563 PFLOPS,which made TianHe-1 the 5th fastest supercomputer on the Top500 list in November,2009.