众核处理器片上网络的层次化全局自适应路由机制被引量：2

A Global Hierarchical Adaptive Routing Mechanism in Many-Core Processor Network-on-Chip

下载PDF

导出

摘要 Mesh和环拓扑结构以其实现简单、易于扩展的特点成为众核处理器片上网络应用最为广泛的拓扑结构.应用于Mesh结构中的健忘型路由算法在网络流量较大时影响片上网络的负载均衡,表现在降低吞吐量和增大数据包延迟.自适应算法中的本地自适应算法和区域自适应算法均存在不同程度的短视现象,不适合大规模的Mesh结构,而目前全局自适应算法又由于路由计算量大而速度缓慢.提出一种新的层次化全局自适应路由机制,包括一个全局拥塞信息传播网络Roof-Mesh和一个层次化全局自适应路由算法(global hierarchical adaptive routing algorithm,GHARA).通过全局拥塞信息传播网络得到拥塞信息,GHARA采用全网分区逐级计算路由的方式,减少了全局路由的计算步骤,从而减少了平均数据包延迟、提升了饱和带宽.实验结果表明GHARA表现优于其他区域和全局自适应路由算法.在人工注入通信模式下,8×8 Mesh平均饱和带宽比全局自适应算法GCA提高10.7%,16×16Mesh平均饱和带宽比全局自适应算法GCA提高14.7%.在运行真实测试程序集SPLASH-2模式下,数据包延迟最高比GCA提高40%,平均提升14%. Accompanied by the arrival of the era of big data,data center has been becoming an infrastructure in human life.Many-core processor provides a highly parallel capability to solve applications in data center such as sorting and searching efficiently.For the purpose to utilize the parallelism of many-core processor,routing algorithm in interconnection network turns into one of the most important issues in many-core system. Mesh and ring are the most employed topological structures in many-core processor for their features of easy implementation and high scalability.Depending on the scope of congestion information,routing algorithms in mesh and ring can be divided into oblivious routing,local adaptive routing,regional adaptive routing and global adaptive routing.The oblivious routing algorithm applied in the mesh architecture affects the load-balance of the network which is reflected in reducing through-put and high packet latency.Current local adaptive routing and regional adaptive routing both suffer from short-sightedness and are not suitable for large scale mesh structure.And prior global adaptive routings are limited by the slow computation of global route.We propose a novel global hierarchical adaptive routing mechanism,which is comprised of a global congestion information propagation network Roof-Mesh and a global hierarchical adaptive routing algorithm GHARA.Roof-Mesh provides a platform to share global congestion information ina hierarchical way among all nodes on the network.Depending on the information supplied by RoofMesh,GHARA reduces the procedure of routing by hierarchically computing from large region perspective to neighbor nodes.The result of experiment shows that GHARA performs better than other regional and global adaptive routings.

作者张洋王达叶笑春朱亚涛范东睿李宏亮谢向辉

机构地区计算机体系结构国家重点实验室(中国科学院计算技术研究所) 中国科学院大学计算机与控制学院河北农业大学信息科学与技术学院数学工程与先进计算国家重点实验室

出处《计算机研究与发展》 EI CSCD 北大核心 2016年第6期1211-1220,共10页 Journal of Computer Research and Development

基金国家“九七三”重点基础研究发展计划基金项目(2011CB302501) 国家自然科学基金项目(61332009,61173007,61221062) “核高基”国家科技重大专项基金项目(2013ZX0102-8001-001-001) 国家“八六三”高技术研究发展计划基金项目(2015AA011204,2012AA010901)~~

关键词众核处理器片上网络负载均衡全局拥塞信息传播网络层次化全局自适应路由算法 Roof-Mesh many-core processor networks-on-chip load balance global congestion information propagation network global hierarchical adaptive routing algorithm（GHARA） Roof-Mesh

分类号 TP302 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献17

1王永庆,谢伦国,付清朝.Torus网络中移动气泡流控及其自适应路由实现[J].计算机研究与发展,2014,51(8):1854-1862. 被引量：1
2Sankaralingam K, Nagarajan R, Gratz P, et al. The distributed microarchitecture of the TRIPS prototype processor [C] //Proc of the 39th Int Symp on Microarchitecture. Piscataway, NJ.- IEEE, 2006 480-491.
3Vangal S, Howard J, Ruhl G, et al. An 80-Tile1.28 TFLOPS network-on-chip in 65nm CMOS [C] //Proc of IEEE lnt Solid state Circuits Conf. Piscataway, NJ: IEEE, 2007:98-99.
4Rahman M, Shah A, Inoguchi Y. A deadlock-free dimension order routing for hierarchical 3D-mesh network [C] //Proc of Int Conf on Computer g- Information Science (ICCIS). Piscataway, NJ: IEEE, 2012:563-568.
5Ramakrishna M, Gratz P, Sprintson A. GCA: Global congestion awareness for load balance in networks-on-chip [C] //Proc of the 7th Int Syrup on Networks on Chip (NoCS). Piscataway, NJ: IEEE, 2013:1-8.
6Woo S, Ohara M, Torrie E, et al. The SPLASH 2 programs: Characterization and methodological considerations [C] //Proc of the 22nd Annual Int Symp on Computer Architecture (ISCA). Piscataway, NJ: IEEE, 1995:24-36.
7Dally W, Aoki H. Deadlock-free adaptive routing in multicomputer networks using virtual channels [J]. IEEE Trans on Parallel and Distributed Systems, 1993, 4 (4): 466-475.
8Li M, Zeng Q, Jone W. DyXY A proximity congestion- aware deadlock-free dynamic routing method for network on chip [C] //Proc of the 43rd Design Automation Conf. Piscataway, NJ : IEEE, 2006 : 849-852.
9Gratz P, Grot B, Keckler S. Regional congestion awareness for load balance in networks on chip [C] //Proc of the 14th High Performance Computer Architecture ( HPCA ). Piscataway, NJ: IEEE, 2008 203-214.
10Ma Sheng, Jerger N, Wang Zhiying. DBAR: An efficient routing algorithm to support multiple concurrent applications in networks-on-chip [C] //Proc of the 38th Annual Int Symp on Computer Architecture (ISCA). Piscataway, NJ: IEEE, 2011 413-424.

二级参考文献14

1肖灿文,张民选,过锋.环网中的维度气泡流控与自适应路由算法[J].计算机研究与发展,2007,44(9):1510-1517. 被引量：3
2Alverson R, Roweth D, Kaplan L. The gemini system interconnect [C] //Proc of the 18th IEEE Symposium on High Performance Interconnects. Los Alamitos, CA: IEEE Computer Society, 2010:83-87.
3Chen Dong, Eisley N A, Heidelberger P, et al. The ibm blue gene/q interconnection fabric [J]. IEEE Micro, 2012, 32(1): 32-43.
4Ajima Y, Sumimoto S, Shimizu T. Tofu: A 6d mesh/torus interconnect for exascale computers [J]. Computer, 2009, 42(11): 36-40.
5Glass C J, Ni L M. The turn model for adaptive routing [J]. Journal of ACM, 1994, 41(5): 874-902.
6Dally W, Towles B. Principles and Practices of Interconnection Networks [M]. San Francisco: Morgan Kaufmann, 2003.
7Chiu G M, The odd-even turn model for adaptive routing [J]. IEEE Trans on Parallel and Distributed Systems, 2000, 11(7): 729-738.
8Fu Binzhang, Han Yinhe, Ma Jun, et al. An abacus turn model for time/space-efficient reconfigurable routing [C] // Proc of the 38th Annual Int Syrup on Computer Architecture. New York: ACM, 2011: 259-270.
9Duato J. A necessary and sufficient condition for deadlock free routing in cut-through and store-and-forward networks[J]. IEEE Trans on Parallel and Distributed Systems, 1996, 7(8) : 841-854.
10Carrion C, Beivide R, Gregorio J A, et al. A flow control mechanism to avoid message deadlock in k ary n-cube networks [C] //Proc of the 4th Int Conf on High Performance Computing. Los Alamitos, CA: IEEE Computer Society, 1997:322-329.

同被引文献14

1李冠楠.无线Mesh网络技术的探索与研究[J].科技经济市场,2008(12):5-6. 被引量：2
2顾华玺,刘增基,王琨,谢启明.Torus网络中分布式自适应路由算法[J].西安电子科技大学学报,2006,33(3):352-358. 被引量：11
3朱晓静.Storus：一个二维片上网络拓扑结构[J].小型微型计算机系统,2008,29(4):751-756. 被引量：3
4李翠锦,刘有耀,杜慧敏,韩俊刚.一种新的分级扭Torus结构RTTM[J].计算机应用,2009,29(8):2149-2152. 被引量：1
5耿罗锋,张多利,高明伦.8核NoC原型芯片设计与应用性能评估[J].电子测量与仪器学报,2009,23(11):89-94. 被引量：4
6万健,李丽,王佳文,张宇昂.“包-电路交换”片上路由器设计与实现[J].微电子学与计算机,2011,28(7):68-71. 被引量：3
7李丽,万健,王佳文,潘红兵,许俊,孙敏敏,侯宁.基于“包-电路交换”的片上网络回退转向路由算法[J].电子与信息学报,2011,33(11):2759-2763. 被引量：6
8姜书艳,罗刚,吕小龙,邓罡,周启忠.片上网络互连串扰故障模型的研究及改进[J].电子测量技术,2012,35(4):123-127. 被引量：6
9汪娟.正交频分复用技术原理及应用[J].卷宗,2012(5):105-105. 被引量：1
10欧阳一鸣,胡春雷,梁华国,谢涛.基于双端口RNI的容错NoC架构[J].计算机工程,2012,38(13):237-239. 被引量：2

引证文献2

1方守川,吴绍玉,乔永杰,陈传庚,孙燕国.无线网状网络技术原理及应用[J].物探装备,2020(1):5-8.
2宋宇鲲,钱庆松,张多利.Torus拓扑结构的双端口NoC模型与性能分析[J].电子测量与仪器学报,2017,31(3):361-368.

1王宏,许都,李乐民.一种k元n方网络中的全局自适应负载均衡路由算法[J].计算机应用,2007,27(4):828-831.
2鲁守银,张学法.非线性非最小相位系统的半全局自适应输出反馈控制[J].山东电力高等专科学校学报,2000,3(3):38-40.
3林慧娴,陆静晔.基于代理的信息传播网络中的初始节点选择[J].微型机与应用,2015,34(4):55-57. 被引量：1
4刘大有,李晶,杨博.信息传播网络学习方法[J].吉林大学学报（理学版）,2012,50(4):767-774. 被引量：2
5王健,刘衍珩,朱建启.全局自适应蚁群优化算法[J].小型微型计算机系统,2008,29(6):1083-1087. 被引量：8
6兰天,郭躬德.特定话题传播网络中的意见领袖检测方法[J].计算机系统应用,2016,25(12):9-15. 被引量：2
7薛必翠.非线性非最小相位系统的半全局自适应输出反馈控制[J].济南大学学报（自然科学版）,2003,17(2):129-131.
8李志,谢强.采用人工鱼群的改进广义Hough变换目标定位[J].中国图象图形学报,2014,19(4):549-555.
9关机之前全搞定,健忘朋友的必备技巧![J].电脑爱好者,2010(19):42-42.
10佚名.内存条的自述[J].中国校园文学（少年号）,2011(7):57-59.

计算机研究与发展

2016年第6期

浏览历史

内容加载中请稍等...

众核处理器片上网络的层次化全局自适应路由机制被引量：2

参考文献17

二级参考文献14

同被引文献14

引证文献2

相关作者

相关机构

相关主题

浏览历史

众核处理器片上网络的层次化全局自适应路由机制 被引量：2

参考文献17

二级参考文献14

同被引文献14

引证文献2

相关作者

相关机构

相关主题

浏览历史

众核处理器片上网络的层次化全局自适应路由机制被引量：2