一种片上众核结构共享Cache动态隐式隔离机制研究被引量：3

An Implicitly Dynamic Shared Cache Isolation in Many-Core Architecture

下载PDF

导出

摘要访存带宽是限制众核处理器性能提升的关键,将片上最后一级Cache设计为所有处理器核共享是必要的.在共享Cache中隔离放置冲突的数据,是提高共享Cache性能的关键.文中提出了缓存块链接的硬件方法,用于隔离共享Cache中不同线程之间的数据.文中基于时钟精准的片上众核结构模拟器,使用Splash2程序组和生物信息学中的任务,对所提机制进行了评估.实验结果表明,与传统共享Cache相比,使用缓存块链接机制时,使得共享Cache的冲突性缺失率降低约20%,而使得IPC平均提高了约10%. Memory bandwidth is critical to overall performance, especially for on-chip many-core architecture. It may be necessary to design a shared last level on-chip cache, to eliminate capacity wasted by multiple copies of one data block in private caches. However, when it comes to on-chip architecture, the conflict in shared cache becomes more serious than traditional single processor architecture. It is crucial to isolate conflicting data blocks in shared cache. This paper proposes a novel hardware approach, that is, block agglutinating, to isolate blocks of different threads in shared cache. Extensive analysis of the proposed scheme with Splash2 benchmarks and Bioinfor- matics workloads is performed using a cycle accurate many-core processor simulator. Experimen- tal results show that when using block agglutinating, it makes an average reduction by about 20% in conflict miss rate of shared cache compared to the traditional shared cache, and it makes IPC improved by about 10%.

作者宋风龙刘志勇范东睿张军超余磊

机构地区中国科学院计算技术研究所计算机系统结构重点实验室中国科学院研究生院

出处《计算机学报》 EI CSCD 北大核心 2009年第10期1896-1904,共9页 Chinese Journal of Computers

基金国家自然科学基金重点项目(60736012) 国家"九七三"重点基础研究发展规划项目基金(2005CB321600) 国家"八六三"高技术研究发展计划项目基金(2009AA01Z103) 北京市自然科学基金(4092044)资助

关键词众核共享CACHE 数据冲突资源隔离容量划分 many-core architecture shared cache conflict resource isolation cache partition

分类号 TP302 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献22

1Almasi G, Cascaval C et al. Dissecting cyclops: A detailed analysis of a multithreaded architecture. ACM SIGARCH Computer Architecture News, 2003, 31(1): 26-38.
2Asanovic K et ah The landscape of parallel computing research: A view from berkeley. UC Berkeley, Technical Report: No. UCB/EECS-2006-183, December 18, 2006.
3Olukotun K, Nayfeh B A et al. The case for a single-chip multiprocessor. ACM SIGPLAN Notices, 1996, 21(9):2-11.
4Seiler L, Carmean D et al. Larrabee: A many core X86 architecture for visual computing. ACM Transactions on Graphics, 2008, 27(3): 1-15.
5Lin C, Sivasubramaniam A et al. Organizing the last line of defense before hitting the memory wall for CMPs//Proceedings of the International Symposium on High Performance Computer Architecture (HPCA'004). Washington, DC, USA: IEEE Computer Society, 2004:176-185.
6Pfister C, F, Norton V A. "Hot-spot" contention and combining in multistage interconnection networks. IEEE Transactions on Computers, 1985, C-34(10): 943-948.
7Suh G E, Rudolph L, Devadas S. Dynamic partitioning of shared cache memory. The Journal of Supercomputing, 2004, 28(1): 7-26.
8Huh J, Kim C et al. A NUCA substrate for flexible CMP cache sbaring//Proceedings of the 19th International Conference on Supercomputing (ICS' 05). Boston, MA, USA, 2005:31-40.
9Chang J, Sohi G S. Cooperative cache partitioning for chip multiprocessors//Proceedings of the 21st ACM International Conference on Supercomputing. Seattle, 2007:242-252.
10Qureshi M K, Patt Yale N. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared eaches//Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. Washington, DC, USA: IEEE Computer Society, 2006: 423-432.

同被引文献66

1高庆狮,刘志勇.一个基于孙子定理的素数存储系统方案[J].计算机研究与发展,1995,32(5):1-7. 被引量：3
2Chun Liu, Anand Sivasubramaniam, Mahmut Kandemir. Organizing the last line of defense before hit- ting the memory wall for CMPs [C]//Proceedings of the International Symposium on High-Performance Computer Architecture ( HPCA' 004). Washington, DC,USA.. IEEE Computer Society, 2004: 176-185.
3Suh G E, Rudolph L, Decadas S. Dynamic partitioning of shared cache memory [J]. Journal of Supercomputing, 2004,28 (1) :7-26.
4Qureshi M K, Patt Yale N. Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partiti-on shared caches [C]// Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchiteeture. Washington, DC, USA.. IEEE Computer Society, 2006 .. 423-432.
5Wuli W A, Mekee S A. Hitting the memory wall: Implica- tions of the obvious. ACM SIGARCH Computer Architecture News, 1995, 23(1): 20-24.
6Asanovic K, Bodik R, Catanzaro B C, et al. The landscape of parallel computing research: A view from berkeley. EECS Department, University of California, Berkeley: Technical Report: UCB/EECS-2006-183, 2006.
7Qureshi M K, Patt Y N. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches//Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. Orlando, USA, 2006: 423-432.
8Kim S, Chandra D, Solihin Y. Fair cache sharing and parti- tioning in a chip multiprocessor architecture//Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques. Juan-Les-Pins, France, 2004: 111-122.
9Stone H S, Turek J, Wolf J L. Optimal partitioning of cache memory. IEEE Transactions on Computers, 1992, 41(9) 1054-1068.
10Sub G E, Devadas S, Rudolph L. A new memory monitoring scheme for memory-aware scheduling and partitioning// Proceedings of the 8th International Symposium on High- Performance Computer Architecture. Cambridge, USA, 2002: 117-128.

引证文献3

1王涛,朱怡安,黄姝娟.基于改进LRU替换策略的共享Cache划分[J].微电子学与计算机,2012,29(1):80-83. 被引量：2
2高珂,陈荔城,范东睿,刘志勇.多核系统共享内存资源分配和管理研究[J].计算机学报,2015,38(5):1020-1034. 被引量：12
3陈逸飞,朱蕾,李宏亮.一种多线程阵列众核处理器的二级Cache划分机制[J].计算机工程与科学,2019,41(3):400-408. 被引量：1

二级引证文献15

1王亚茹,王鹏,王德志.基于MPI的多核并行模式的性能测试与分析[J].成都信息工程大学学报,2018,33(6):617-623. 被引量：4
2于宏文,郑春伟,汪洋,汤卫东,马志斌,徐建航.智能电网调度控制系统中历史数据服务优化方案[J].电力系统自动化,2016,40(19):113-118. 被引量：14
3牟刚.手机芯片带宽性能评测手段的分析和优化[J].微型机与应用,2017,36(9):81-84.
4纪连恩,赵妮,梁适宜,黄博.支持多用户的三维地震体远程可视化关键技术[J].系统仿真学报,2018,30(7):2540-2549. 被引量：2
5赵欢欢,王梓,孙培伟,张建民,廖龙涛.基于压水堆仿真机的核电厂控制系统设计平台的开发与验证[J].应用科技,2018,45(2):90-95. 被引量：1
6张慧宁,李拥军,王绍东.Redis压缩列表研究与优化设计[J].计算机工程与应用,2018,54(18):90-98. 被引量：2
7王子炫,魏力,张育平.基于磁光虚拟存储系统的文件调度算法[J].计算机与现代化,2019(5):7-12. 被引量：1
8刘骁,唐勇,郑方,丁亚军.共享指令缓存XOR散列索引的研究与设计[J].计算机学报,2019,42(11):2499-2511. 被引量：2
9徐双国,刘云.基于核间中断实现的嵌入式多核系统通信方法[J].舰船电子工程,2021,41(5):58-61. 被引量：3
10陈彬.通用非对称多核方案设计[J].计算机系统应用,2021,30(7):277-282. 被引量：3

1U盘分区后只认一个分区[J].电脑迷,2009(13):89-89.
2李锐.存储虚拟化安全防护技术研究[J].计算机安全,2011(6):45-47. 被引量：1
3PQI Traveling Disk U250闪存盘[J].电脑迷,2006,0(1):20-20.
4张品,张海明,黎建辉.一种基于Linux容器技术的大规模遥感数据云服务平台[J].科研信息化技术与应用,2015,6(2):47-55. 被引量：5
5刘燕.CACHE应用性能及程序优化[J].闽江学院学报,2003,24(2):35-38.
6林柏双.Cache性能和优化的研究[J].引进与咨询,2002(6):12-13. 被引量：1
7程军锋.Cache性能分析[J].新乡教育学院学报,2009(1). 被引量：1
8李晨,涂碧波,孟丹,冯圣中.基于多安全机制的Linux应用沙箱的设计与实现[J].集成技术,2014,3(4):31-37. 被引量：4
9李汉雄,白光伟,沈航,承骁.云计算环境下实时流媒体业务的性能研究[J].计算机工程与设计,2015,36(6):1438-1444. 被引量：1
10余阳,蔡少书,黄欣.基于事件驱动的多媒体同步与链接机制[J].计算机工程,1999,25(8):69-71. 被引量：1

计算机学报

2009年第10期

浏览历史

内容加载中请稍等...

一种片上众核结构共享Cache动态隐式隔离机制研究被引量：3

参考文献22

同被引文献66

引证文献3

二级引证文献15

相关作者

相关机构

相关主题

浏览历史

一种片上众核结构共享Cache动态隐式隔离机制研究 被引量：3

参考文献22

同被引文献66

引证文献3

二级引证文献15

相关作者

相关机构

相关主题

浏览历史

一种片上众核结构共享Cache动态隐式隔离机制研究被引量：3