摘要
近年来,众核处理器(Many Integrated Cores,MIC)越来越多地为人们所关注,众核架构已经成为许多超算的首选。BP神经网络是采用反向误差传播(Back Propagation,BP)算法的人工神经网络,对于处理器的浮点计算能力要求比较高。目前最新的Intel Xeon Phi(KNL)众核处理器可以达到3TFLOPS的双精度浮点峰值性能。本文对BP神经网络在KNL上进行了向量化扩展,并使用寄存器分块和缓存分块方法优化研究。实验结果表明在KNL上最快能达到220img/s的处理速度,其加速比达到了13.2,为GPU的2.9倍,KNC的2.28倍。
In recent years, the MIC(Many Integrated Cores)more and more people's attention, many core architecture has become the first choice for many supercomputing.BP neural network is a kind of artificial neural network based on BP(Back Propagation)algorithm, which requires a high level of floating-point computing capability.The latest Intel Xeon Phi (KNL) core processor can achieve 3TFLOPS double precision floating point peak performance.In this paper, we extend the BP neural network on KNL, and use the method of register block and cache block to optimize the research.The experimental results show that the fastest processing speed of 220img/s can be achieved on the KNL, and the speedup ratio is 13.2, which is times of GPU and KNC is 2.28 times.
出处
《电子世界》
2017年第3期48-51,共4页
Electronics World
基金
国家自然科学基金(Grant No.61571226)
江苏省自然科学基金(青年科学基金)(Grant No.BK20140823)资助
关键词
众核架构
BP神经网络
缓存分块
向量化
many-core architecture
BP neural network
cache block
vectorizatio