摘要
以高性能计算中的经典问题——多体问题的快速多极子(FMM)算法为例,分析FMM算法的各个步骤,根据计算、通信和存储特性将算法中的子过程归类。在CPU、GPU、FPGA和CELL上分别进行测试,提出执行FMM算法的混合可重构体系结构配置方案,并进一步优化算法,分解任务流。针对不同任务流的特点,提出可行的解决方案。结果证明,该方案可提高算法效率。
Accelerators are increasingly viewed as computer coprocessors that can provide significant computational performance at low price. This paper implements and tests every sub-procedure of Fast Multipole Method(FMM) on GPU, FPGA and CELL based on the analysis of computational, storage and communication characteristics. It makes two contributions to optimize FMM. A mixed configurable computer architecture which can run FMM well is presented. FMM is optimized on mixed architecture through decomposing its task flow. The probable solution for different task flow is also put forward based on the large experiment results. Results show that the scheme can increase the efficiency of the algorithm.
出处
《计算机工程》
CAS
CSCD
2012年第16期275-278,283,共5页
Computer Engineering
基金
国家"863"计划基金资助项目(2009AA012201-CFA2009SHDX01)
上海市重点学科建设基金资助项目(J50103)