期刊文献+

ARM并行阵列机中的路由器设计

Router Design for a ARM Parallel Processor
下载PDF
导出
摘要 针对ARM并行阵列机结构,提出了与之相适应的通信结构,采用4个路由器完成16个处理器内核之间的通信,有效地节约了面积.该路由器采用基于数据包交换的片上网络通信方式,内部运用缓存机制、经典的XY路由算法和专用的仲裁策略再加入数据多播,且处理器选用低功耗、高性能的ARM内核,通过采用以上机制能够有效降低数据传播延迟和功耗.实验结果表明采用该方案设计的路由器时钟频率最高可达406.009 MHz,能够满足该ARM阵列机对于通信速率的要求. Routing communication structure designed in this paper's project involves calls ARM parallel array processors that use four routers to complete the communication between the 16-core processors. Therefore it makes the area reduced at a greater degree. The router uses the way of Noc (Network on Chip) communication that is based on the packet switching , it uses internal caching mechanism, classic XY router algorithms, dedicated arbitration policy and data multicast, and with low power of high-performance ARM processor, these mechanisms reduce the data propagation delay and power consumption at the same time. Makes the performance of communication between multi-core processors has been increased greatly. The result shows that the clock frequency of the router, which designs for the telecommunications in the array of ARM muti-core machine, is up to 406. 009MHz and the router can better meet the performance requirements of the array of ARM muff-core machine.
出处 《微电子学与计算机》 CSCD 北大核心 2017年第2期73-76,82,共5页 Microelectronics & Computer
关键词 路由器 缓存机制 经典XY路由算法 核间通信 数据多播 router caching mechanism classic XY router algorithms communication between multi-core processors data multicast
  • 相关文献

参考文献3

二级参考文献45

  • 1杜高明,高明伦,尹勇生,胡永华,周干民.基于通讯的NoC设计[J].微电子学与计算机,2006,23(4):11-14. 被引量:5
  • 2Asanovic K et al. The landscape of parallel computing research: A view from berkeley. UC Berkeley: Technical Report No. UCB/EECS 2006-183, 2006.
  • 3Almasi G, Cascaval C, Castanos J G, Denneau M, Lieber D, Moreira J E, Warren H S, Jr. Dissecting eyclops: A detailed analysis of a multithreaded architecture. ACM SIGARCH Computer Architecture News, 2003, 31(1): 26-38.
  • 4Kongetira P, Aingaran K et al. Niagara: A 32-way multithreaded spare processor. IEEE Micro, 2005, 25(2) 21-29.
  • 5Seiler Larry, Carmean Doug et al. Larrabee: A many-core X86 architecture for visual computing//Proceedings of the International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH ' 08). Los Angeles, California, 2008.
  • 6Jiang D, Singh J P. A methodology and an evaluation of the SGI Origin 2000//Proceedings of the ACM Sigmetries98/ Performance 98. Madison, Wisconsin, United States, 19981 171-181.
  • 7Eichenberger A E, Abraham S G. Impact of load imbalance on the design of software barriers//Proceedings of the 1995 International Conference on Parallel Processing. 1995 : 63-72.
  • 8Lim G H, Agarwal A. Reactive synchronization algorithms for multiprocessors//Proceedings of the Architectural Support for Programming Languages and Operating Systems. San Jose, California, 1994:25-35.
  • 9Martin R P, Vahdat A Met al. Effect of communication latency, overhead, and bandwidth on a cluster architecture// Proceedings of the 24th Annual International Symposium on Computer Architecture. Denver, Colorado, United States, 1997, 85-97.
  • 10Mellor-Crummey J M, Scott M L. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems, 1991, 9(1): 21-65.

共引文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部