摘要
快速傅里叶变换(Fast Fourier Transform,FFT)是最重要的基础算法之一,在科学计算、信号处理、图像处理等领域都有着广泛的应用。随着这些应用领域对实时性需求的进一步提高,FFT算法面临着越来越高的性能要求。在现有的FFT算法库中,FFT算法的求解速度和计算精度受到一定程度的限制,而且也少有研究者对偶数基Cooley-Tukey FFT的高性能实现提出相应的优化策略并对技术进行深入研究。基于此,文中提出了一套针对偶数基的Cooley-Tukey FFT的优化策略和方法。首先构建一个SIMD(Single Instruction Multiple Data)友好、支持混合基的蝶形网络,然后根据偶数基旋转因子特性最大限度地降低蝶形计算的复杂度,接着通过SIMD汇编优化、汇编指令重排及选择、寄存器分配策略制定、高性能矩阵转置算法等方法来优化应用,最后实现一个高性能的FFT算法库。目前,最流行、应用最广的FFT有FFTW和Intel MKL。实验结果表明,在X86计算平台上,新提出的这套针对偶数基Cooley-Tukey FFT的技术所实现的FFT算法库的性能全面优于MKL和FFTW。所提出的这套高性能算法优化和实现技术体系,可推广到除偶数基以外的其他基的实现和优化上,为进一步的研究开发工作奠定一定的基础,进而突破FFT算法在硬件平台上的性能瓶颈,实现一套针对特定平台的高性能FFT算法库。
Fast Fourier transform(FFT)is one of the most important basic algorithms,which is widely used in scientific calculation,signal processing,image processing and other fields.With the further improvement of real-time requirements in these application fields,fast Fourier transform algorithms are facing higher and higher performance requirements.In the existing FFT algorithm library,the solution speed and calculation accuracy of FFT algorithm are limited to a certain extent,and few researchers put forward corresponding optimization strategies and conducted in-depth research on the implementation of cooley-tukey fast Fourier transform based on even Numbers.Based on this,this paper put forward a set of for even basis of optimization strategy and method for Colley-Turkey fast Fourier transform.Firstly,a friendly butterfly network supporting SIMD mixed is constructed.Secondly,according to the even base rotation factor characteristics,the complexity of the butterfly calculation is reduced to a maximum degree.Thirdly,through the SIMD assembly optimization,assembly instruction rearrangement and selection,register allocation strategy and high performance matrix transpose algorithm method,the application is optimized.Finally a high performance FFT algorithm library is achieved.Currently,the most popular and widely used FFT are FFTW and Intel MKL.Experimental results show that on X86 computing platform,the performance of FFT library based on cooley-tukey FFT is better than MKL and FFTW.The high performance algorithm is put forward by the new optimization method and implementation technology system,which can be generalized to other except the even base based on the realization and optimization of a certain basis for further research and development work,to break through the FFT algorithm performance bottlenecks in the hardware platform,to achieve a high performance FFT algorithms library for a specific platform.
作者
龚彤艳
张广婷
贾海鹏
袁良
GONG Tong-yan;ZHANG Guang-ting;JIA Hai-peng;YUAN Liang(School of Information,Guizhou University of Finance and Economics,Guiyang 550025,China;State Key Laboratory of Computer Architecture,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)
出处
《计算机科学》
CSCD
北大核心
2020年第1期31-39,共9页
Computer Science
基金
国家重点研发计划(2018YFC0809306)
国家自然科学基金青年科学基金(61602443)
国家自然科学基金重点项目(61432018)~~
关键词
快速傅里叶变换算法
偶数基
蝶形计算优化
蝶形网络优化
SIMD汇编优化
高性能FFT库
Fast Fourier transform algorithm
Even basis
Butterfly calculation optimization
Butterfly network optimization
SIMD assembly optimization
High performance FFT library