期刊文献+

一种片上众核结构共享Cache动态隐式隔离机制研究 被引量:3

An Implicitly Dynamic Shared Cache Isolation in Many-Core Architecture
下载PDF
导出
摘要 访存带宽是限制众核处理器性能提升的关键,将片上最后一级Cache设计为所有处理器核共享是必要的.在共享Cache中隔离放置冲突的数据,是提高共享Cache性能的关键.文中提出了缓存块链接的硬件方法,用于隔离共享Cache中不同线程之间的数据.文中基于时钟精准的片上众核结构模拟器,使用Splash2程序组和生物信息学中的任务,对所提机制进行了评估.实验结果表明,与传统共享Cache相比,使用缓存块链接机制时,使得共享Cache的冲突性缺失率降低约20%,而使得IPC平均提高了约10%. Memory bandwidth is critical to overall performance, especially for on-chip many-core architecture. It may be necessary to design a shared last level on-chip cache, to eliminate capacity wasted by multiple copies of one data block in private caches. However, when it comes to on-chip architecture, the conflict in shared cache becomes more serious than traditional single processor architecture. It is crucial to isolate conflicting data blocks in shared cache. This paper proposes a novel hardware approach, that is, block agglutinating, to isolate blocks of different threads in shared cache. Extensive analysis of the proposed scheme with Splash2 benchmarks and Bioinfor- matics workloads is performed using a cycle accurate many-core processor simulator. Experimen- tal results show that when using block agglutinating, it makes an average reduction by about 20% in conflict miss rate of shared cache compared to the traditional shared cache, and it makes IPC improved by about 10%.
出处 《计算机学报》 EI CSCD 北大核心 2009年第10期1896-1904,共9页 Chinese Journal of Computers
基金 国家自然科学基金重点项目(60736012) 国家"九七三"重点基础研究发展规划项目基金(2005CB321600) 国家"八六三"高技术研究发展计划项目基金(2009AA01Z103) 北京市自然科学基金(4092044)资助
关键词 众核 共享CACHE 数据冲突 资源隔离 容量划分 many-core architecture shared cache conflict resource isolation cache partition
  • 相关文献

参考文献22

  • 1Almasi G, Cascaval C et al. Dissecting cyclops: A detailed analysis of a multithreaded architecture. ACM SIGARCH Computer Architecture News, 2003, 31(1): 26-38.
  • 2Asanovic K et ah The landscape of parallel computing research: A view from berkeley. UC Berkeley, Technical Report: No. UCB/EECS-2006-183, December 18, 2006.
  • 3Olukotun K, Nayfeh B A et al. The case for a single-chip multiprocessor. ACM SIGPLAN Notices, 1996, 21(9):2-11.
  • 4Seiler L, Carmean D et al. Larrabee: A many core X86 architecture for visual computing. ACM Transactions on Graphics, 2008, 27(3): 1-15.
  • 5Lin C, Sivasubramaniam A et al. Organizing the last line of defense before hitting the memory wall for CMPs//Proceedings of the International Symposium on High Performance Computer Architecture (HPCA'004). Washington, DC, USA: IEEE Computer Society, 2004:176-185.
  • 6Pfister C, F, Norton V A. "Hot-spot" contention and combining in multistage interconnection networks. IEEE Transactions on Computers, 1985, C-34(10): 943-948.
  • 7Suh G E, Rudolph L, Devadas S. Dynamic partitioning of shared cache memory. The Journal of Supercomputing, 2004, 28(1): 7-26.
  • 8Huh J, Kim C et al. A NUCA substrate for flexible CMP cache sbaring//Proceedings of the 19th International Conference on Supercomputing (ICS' 05). Boston, MA, USA, 2005:31-40.
  • 9Chang J, Sohi G S. Cooperative cache partitioning for chip multiprocessors//Proceedings of the 21st ACM International Conference on Supercomputing. Seattle, 2007:242-252.
  • 10Qureshi M K, Patt Yale N. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared eaches//Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. Washington, DC, USA: IEEE Computer Society, 2006: 423-432.

同被引文献66

  • 1高庆狮,刘志勇.一个基于孙子定理的素数存储系统方案[J].计算机研究与发展,1995,32(5):1-7. 被引量:3
  • 2Chun Liu, Anand Sivasubramaniam, Mahmut Kandemir. Organizing the last line of defense before hit- ting the memory wall for CMPs [C]//Proceedings of the International Symposium on High-Performance Computer Architecture ( HPCA' 004). Washington, DC,USA.. IEEE Computer Society, 2004: 176-185.
  • 3Suh G E, Rudolph L, Decadas S. Dynamic partitioning of shared cache memory [J]. Journal of Supercomputing, 2004,28 (1) :7-26.
  • 4Qureshi M K, Patt Yale N. Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partiti-on shared caches [C]// Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchiteeture. Washington, DC, USA.. IEEE Computer Society, 2006 .. 423-432.
  • 5Wuli W A, Mekee S A. Hitting the memory wall: Implica- tions of the obvious. ACM SIGARCH Computer Architecture News, 1995, 23(1): 20-24.
  • 6Asanovic K, Bodik R, Catanzaro B C, et al. The landscape of parallel computing research: A view from berkeley. EECS Department, University of California, Berkeley: Technical Report: UCB/EECS-2006-183, 2006.
  • 7Qureshi M K, Patt Y N. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches//Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. Orlando, USA, 2006: 423-432.
  • 8Kim S, Chandra D, Solihin Y. Fair cache sharing and parti- tioning in a chip multiprocessor architecture//Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques. Juan-Les-Pins, France, 2004: 111-122.
  • 9Stone H S, Turek J, Wolf J L. Optimal partitioning of cache memory. IEEE Transactions on Computers, 1992, 41(9) 1054-1068.
  • 10Sub G E, Devadas S, Rudolph L. A new memory monitoring scheme for memory-aware scheduling and partitioning// Proceedings of the 8th International Symposium on High- Performance Computer Architecture. Cambridge, USA, 2002: 117-128.

引证文献3

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部